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A CONTRIBUTION TO ESKIMO CRANIOLOGY BASED ON 
PREVIOUSLY PUBLISHED MEASUREMENTS 


By G. M. MORANT, D.Sc. 


1. Introduction. Among modern races of man the physical type of the Eskimos 
is undoubtedly one of the most specialized known. The available metrical data 
relating to it are far more extensive for the cranium than for any other part of the 
skeleton, or for the living people. Numerous studies of series of Eskimo skulls 
have been published in the last sixty years, but the majority of these are of little 
value for statistical purposes, either on account of the fact that the measurements 
given are inadequate, or because the series described are too small, or for both 
these reasons. Until recent years the monumental Crania Groenlandica of Pro- 
fessors Fiirst and Hansen, published in 1915, was by far the most valuable of the 
contributions to the subject, but it relates only to Eastern Eskimos. The publica- 
tion by Dr Hrdlitka, between 1924 and 1929, of data relating to a large number 
of Western and Central Eskimo skulls satisfied a long-felt need. These two sets 
of material are the only ones dealt with, apart from comparative material for 


other races, in the present paper. The measurements discussed were only treated 
by the crudest statistical methods when first presented. 


2. Measurements of Eskimo Skulls provided by Dr Ales Hrdli¢ka. Dr Hrdlitka 
has published several contributions to Eskimo craniology, but when considering 
a statistical treatment of his material it is only necessary to refer to two of these: 

(a) “Catalogue of Human Crania in the United States National Museum 
Collections”, Proceedings of the United States National Museum, txt (1924), 
pp. 1-51. This, the first part of the extensive catalogue, contains individual 
measurements of a number oféeries of Eskimo skulls from different localities. 
There are only two of the male series sufficiently long for statistical purposes, 
viz. one of 40 Greenland Eskimo crania and another of 159 Eskimo crania from 
the north coast of St Lawrence Island in the Bering Sea. One of the former came 
from the Noursoak Peninsula on the west coast of Greenland, and there are no 
recorded localities for the other specimens in the series. No particulars regarding 
the discovery or age of the materia! are provided. 


(6) “ Anthropological Survey in Alaska”, Forty-Sixth Annual Report of the 
Bureau of American Ethnology, 1928-9, pp. 19 374. This provides (pp. 254—99) 
a detailed discussion and mean measurements of a considerable number of groups 
of Eskimo crania, the majority being made up by small numbers of specimens. 
All the material of this kind previvusly described by the author is apparently 
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2 Eskimo Craniology based on Previously Published Measurements 


included here, and there is a considerable amount of new material. The 
specimens appear to be of modern or recent date, but one series discussed at 
greater length than the others (pp. 318-29) is believed to represent 2 population 
which lived near Point Barrow, on the north coast of Alaska, before contact with 
Europeans was established. This is known as the ‘“‘Old Igloos” series. Dr Hrdli¢tka 
says that for the purpose of this report he re-measured all the specimens which he 
had previously described. The means for the Greenland and St Lawrence Island 
series differ somewhat from those in the 1924 Catalogue, and the numbers on 
which these means are based were also changed. It is said that the individual 
measurements for all the material will be given in a part of the Catalogue which 
has not yet been published. 

Before making statistical comparison between the different series, it is neces- 
sary to comment on the definitions followed in determining the measurements. 
Dr Hrdlitka has published an account of his technique,* which is based on that 
of the Monaco Congress of 1906 with several modifications. A list of the measure- 
ments given forthe Eskimo series follows, the numbers preceded by I.A. being those 
of the International Agreement and the letters those of the biometric technique: 


(i) Maximum “glabello-occipital” length: I.A.1. This is not precisely the 
same as L, defined to be the maximum calvarial length from the glabella in the 
median sagittal plane, but the two definitions will give readings which are either 
identical or very close to one another. 


(ii) Maximum breadth above the mastoids and roots of zygomae: I.A. 3. 
This, again, is not precisely the same as B, defined to be the maximum transverse 
diameter on the parietal bones, but the two definitions will almost invariably give 
identical or closely similar readings. 


(iii) Basion-bregma height: I.A. 4, a. Although the basion is insufficiently 
defined, this measurement may be supposed the same as H’. 


(iv) Cranial capacity: I.A. 24, c. Hrdlitka determined this with seed by 
using a method which he describes in detail. It is commonly found that diiferent 
methods often give appreciably different results. 


(v) Upper facial height from nasion to alveolar point: I.A. 12, G’H. The 
alveolar point is defined to be the “lowest point of the alveolar border between 
the two median upper incisors’’. 


(vi) Facial length from basion to alveolar point. Hrdlitka gives this definition 
without comment, and it may be presumed that he used the same alveolar point 
as in finding the upper facial height, so the measurement is GL, assuming that 
the basions used are the same. He diverges here from the Monaco definition 


* “Anthropometry. D. Skeletal Parts: the Skull”, American Journal of Physical Anthropology, 
11 (1919), pp. 401-28. This article was reprinted in the author’s Anthropometry, Philadelphia (1920). 


An English translation of the Monaco report is given in the same volume of the Journal and in 
the book. 
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G. M. Morant 3 


(I.A. 10), which specifies that the “alveolar point” used in this case is the “ median 
point of the anterior border of the alveolar arch”’, i.e. Martin’s prosthion. 

(vii) Chord from nasion to basion: I.A. 9. This may be supposed the same as 
LB, on the supposition that the same basions are intended. 

(viii) Maximum bizygomatic breadth: I.A. 8, J. 

(ix) Nasal height. All the definitions agree in using the nasion as the superior 
terminal of this measurement. According to the Monaco technique (I.A. 13) the 
inferior terminal is “the middle of a line connecting the lowest points of the two 
nasal fossae”’, which is inexact as the middle point will normally lie in the nasal 
spine and not on any surface. Hrdlitka’s practice is to “measure separately to 
each subnasal point and record the mean”’, the subnasal points being defined as 
“the lowest point, on each side, on the lower border of the nasal aperture, i.e. 
the lowest points anteriorly of the two nasal fossae”’. This measurement is likely 
to give such closely similar readings that it may be supposed the same as NH, L. 

(x) Maximum breadth of nasal aperture: [.A. 14, NB. 

(xi) Orbital breadth: I.A. 16. The terminal of this measurement nearest to 
the median sagittal piane is the dacryon, or “if the dacryon is obliterated, or in 
an abnormal situation, take the point where the posterior lacrymal crest meets 
the inferior border of the frontal’’. The lateral terminal is “the external border 
of the orbit, at the point where the transverse axis of the orbit meets the border, 
and parallel as far as possible to the superior and inferior borders’’. This is an 
inadequate definition, since there is no exact way of deciding when the dacryon is 
in an abnormal situation, and the point sometimes substituted for it normally 
gives a lesser breadth than that from the true dacryon. Hrdlitka follows the 
Monaco instructions, and only gives data for the mean of the two orbital breadths 
so obtained. His measurement may be denoted by O; (or more precisely by 
3 (O.R+0;L)), though it is not exactly the same as the true dacryal breadth. 

(xii) Orbital height: I.A. 17. This is the maximum height perpendicular to the 


breadth and it may be supposed the same as O,. Hrdlitka gives the mean of the 
heights of the two orbits. 


These are the only absolute measurements provided by Dr Hrdlitka for 
Eskimo skulls which are dealt with below. He gives data for five additional 
measurements determined according to the Monaco definitions, viz. the length 
and breadth of the “upper alveolar arch” and the chord from basion to subnasal 
point, for which there is little comparative material; the menton-nasion height, 
an unreliable measurement owing to the fact that it is influenced by wear of the 
teeth, and the height of the mandible at the symphysis. The omission of a number 
of customary measurements—such as the principal arcs, minimum frontal breadth 
and palatal and foraminal measurements——is to be regretted. The indices and 
angles in curled brackets in the tables below were obtained from the mean values 
of the component lengths (indices) or sides of the triangle (angles) instead of from 
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4 Eskimo Craniology based on Previously Published Measurements 


values for individual skulls. The angles (VN 2, AZ and BZ) are the three of the | 
fundamental triangle of which the sides are G’H, GL and LB. Hrdlitka gives 
means for the first, which he calls the facial angle, and some of our means for it 
differ from his quite markedly. 

There is reason to suspect that he modified the ways in which some of his 
measurements were taken between the times when the data were obtained for the 
1924 Catalogue and 1928 Report. This is suggested by a comparison of the means 
for the two male series given in the former year with those for the same series 
given after re-measurement, a number of additional specimens having been added 
in one case. There is a close agreement between corresponding means except in 


the case of the following characters: 
Cc | GH | J | NH 100 NB/NH 
| 
Greenland | 1924 | 1560 (34) | 74-4 9 6) | 140-0 (30) | 53-4 (39) | 42-9 (36) 
| 1928 | 1518 (42) | 76-1 (46) wath 140-5 (47) | 52-4 (48) | 44:3 (48) 


140-8 (151) | 55-4 (150) | 44-6 (150) 
78: 2 (130) | | 142- 0 (148) | 54-2 (148) | 45-2 (148) 


1928 | 1462 (142 


| | 

St Lawrence Island | 1924 | 1506 (129) | 76-6 (144) 
) | 
| | | 


The differences may be partly due to the fact that the corresponding means are 

given for different numbers of specimens, and the sexes of some of them may have 

been changed, but it is impossible to avoid the conclusion that the divergences 

for these characters are due primarily to a change in the definitions of the 

measurements. It appears that «o other hypothesis can explain why the facial 

heights increased on re-measurement while the nasal heights decreased, or why 

the capacities decreased while the major calvarial chords remained practically 

unchanged. The differences which must be attributed to personal equation are 

large enough to be disturbing when an attempt is made to distinguish small 

differences between neighbouring Eskimo types. The only means of Dr Hrdlitka’s 

series used below are those derived from the data given in his 1928 Report, but this 

7 oe contains no individual measurements and the standard deviations of two of the 
es. series in our Table I were obtained from the readings in the 1924 Catalogue. 

3. Measurements of Greenland Eskimo Skulls provided by Professors Carl 

M. First and F.C. C. Hansen. The Crania Groenlandica* of these authors is one 

of the most valuable and comprehensive treatises of its kind available for any 

race. Descriptions and detailed individual measurements of 380 crania are given, 

14 of these being immature and 8 others unsexed. The sample forms a selection 

Bre ce: from the total Eskimo population of Greenland, the more densely populated west 

Be a coast being represented by larger numbers of specimens than the south-west and 


* Crania Groenlandica, A Description of Greenland Eskimo Crania with an Introduction on the 
Geography and History of Greenland, Copenhagen (1915). 
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east coasts. The distributions and means are, unfortunately, for the combined 
male and female series in the case of all characters except the cephalic index. 
A biometric treatment of the male series has been given by the present writer.* 
Only those constants for characters which are available for Hrdlitka’s series are 
considered below. Fiirst and Hansen’s orbital breadth, “from the lateral to the 
vertical medial margin, which is the direct continuation of the lower orbital 
margin and which in the Greenlanders is continued sharply and often high on the 
maxillary bone”’’, may be supposed the same as the biometric orbital breadth O,, 
and this is a different measurement from the dacryal breadth given by Hrdlitka. 
There is no doubt as to whether the measurements of the two sets of data may be 
considered the same in definition or not except in the case of the nasal height. 
Fiirst and Hansen do not define the way in which they determined this measure- 
ment, but it may be presumed that it was the same as, or very similar to, that 
employed by Hrdlitka. Their cranial capacities were found with the aid of millet 
seed and a graduated glass cylinder, and they may also be supposed comparable 
to his determined by a slightly different technique. 


4. The Variabilities of Male Eskimo Series. Standard deviations are given in 
Table I for the only two Eskimo series measured by Hrdlitka for which individual 
measurements have been published (in the 1924 Catalogue), for Fiirst and Han- 
sen’s Greenland series (the constants being taken from the paper cited), and for 
the long Egyptian series often used for comparative purposes.j These constants 
have been given for more than twice as many characters relating to the last two 
series. The data relate to 13 characters and the four series give 6 comparisons in 
pairs for each, so there is a total of 78 comparisons. In 29 cases the differences 
exceed three times their probable errors, and several of them are markedly 
significant, the highest ratio of a difference to its probable error being 9-6. Twelve 
of the 13 o’s for the St Lawrence Island series are less than the corresponding 
values for the Egyptian, and the differences exceed three times their probable 
errors in 8 of these cases: 11 of the o’s for Hrdlitka’s Greenland series are less than 
the Egyptian values, and the differences exceed three times their probable errors 
in the case of 4 of these 11 comparisons. But the o’s for Fiirst and Hansen’s 
Greenland series are in excess of the Egyptian in the case of 10 of the 13 characters, 
and for 4 of these 10 the differences may be supposed significant. These divergences 
in variability are more marked than those usually found in the comparison of 
cranial series believed to be racially homogeneous. This is evidently due to the 
fact that Hrdlitka’s two series show peculiarly small variation (only one character 
showing a significant difference in the comparison between them), while the 
other two show a greater and more common order of variation. It has been found 


* Tn “Studies of Palaeolithic Man. I. The Chancelade Skull and its Relation to the modern 
Eskimo Skull”, Annals of Eugenics, 1 (1926), pp. 257-76. 

+ Karl Pearson and Adelaide G. Davin, “On the Biometric Constants of the Human Skull”, 
Biometrika, xvt (1924), pp. 328-63. 
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6 Eskimo Craniology based on Previously Published Measurements 


for other material that the populations of small islands—such as the Guanche of 
Tenerife—are distinctly less variable than others. The distinction of Hrdlitka’s 
Greenland series in the respect considered may be due to the fact that the skulls, 


TARLE I 
Standard Deviations of Male Series of Crania* 
Eskimo Egyptian 
26th-30th 
St Lawrence eeieiee Greenland dynasties 
Island Hrdlitke (Fiirst and (Pearson and 
(Hrdlitka) ( a) Hansen) Davin) 

103-6 +4-4 (129) | 86-0 +7-0(34) '128-8 +4-6(175) |113-5 +2-0 (753) 

L 4-96+-19 (158) | 4:364+-34(38) | 5-81+-20(191) | 5-72+-09 (895) 

B 4-05 + -15 (156) 4-34 + -34 (36) 4-52 +-16 (191) 4-76+-08 (896) 
Ht 4-25+-17 (143) | 3-97+-30(39) | 4:79+4-17 (183) | 5-03+4-08 (884) 
3-32 +-13 (144) 3°96 + (36) 4-39 + -15 (191) 4:15 +-07 (845) 

J 4-864+:19 (151) | 5:74+-50(30) | 6-48+-23 (185) | 4-57+-08 (785) 
NH 2-39 + -09 (150) %-69 + (39) 3-10 +-11 (192) 2-92 + -05 (898) 
NB 1-74+-07 (153) 1-67 +-13 (36) 1-75 + -06 (191) 1-77 + -03 (893) | 
Oot 1-62+-06 (148) | 1:844-14(38) | 2-03+-07 (188) | 1-88+-03 (888) | 
1-52+-06 (148) | 0-92+4-07 (38) | 2-464+-09(189) | 1-65+-03 (880) | 

100 B/L 2-39 + -09 (156) 3-07 +25 (35) 3°00 +-10 (190) 2-68 + -06 (884) | 
100 NB/NH 3°32 + -13 (150) 3:74+4°30 (36) 3°84+-13 (191) 3°82 + -06 (881) | 
100 0,/0,'i| 3-70+:15 (148) | 4834-37 (38) | 5-60+-19 (189) | 4-95+-08 (876) 


* The + signs in this paper indicate probable errors, as in all earlier anthropometric papers in 
Biometrika. 


+ The Egyptian o is for the vertical height from the basion (H) instead of H’. Both these 
measurements are available for the male Eskimo series measured by Fiirst and Hansen, the o’s 
being 4:78+-17 for H and 4-79+-17 for H’. 


{ The o’s are for the means of the right and left orbital heights in the case of Hrdlitka’s series, 
and for the height of the left orbit in the case of the other two series. 

§ The o’s are for the means of the right and left orbital breadths from the dacrya (O,’) in the 
case of Hrdlitka’s series, and for the maximum breadths from the medial margin and for the left 
orbit only (O, L) in the case of the other two series. Both orbital breadths have been given for a 
few long series and the o’s for them have been found to be in close agreement. 

|| The o’s are for the orbital indic’. “ound from the means of the heights and dacryal breadths 
‘for the right and left sides (100 O,/0,’)»i the case of Hrdlitka’s series, and ior 100 0,/0,, L in the 
case of the other two. These also shoy. «a close agreement when found for the same series. 


of unknown origin, came entirely or in large part from a small Eskimo com- 
munity. Fiirst and Hansen’s series may be considered a sample drawn, more or 
less at random, from the total Eskimo population of the country. In calculating 
all the coefficients of racial likeness given in this paper the Egyptian # standard 
deviations were used. These values are probably close to those which would be 
found for the majority of the series measured by Hrdlitka. 


5. Comparisons of Eskimo Series by the Method of the Coefficient of Racial 
Likeness. Dr Hrdlitka’s 1928 Report contains individual or mean measurements 
of 36 groups of male Eskimo skulls ranging in size from one to 153 individuals. 
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Most of the series are made up by fewer than 10 specimens, and pooling of some 
of the material is obviously required as a preliminary step towards statistical 
analysis. Three Western groups were first compiled by combining in two cases 
the skulls from neighbouring localities in order to make up samples of a sufficient 
size. The approximate positions of the sites can be seen from the map in Fig. 1, 
and the numbers in brackets below give the numbers of male skulls from each site. 
The Western groups are: 

W,—Prince William Sound (1), Kodiak Island (1), Unalaska Peninsula (1), 
Nushagak Bay (1), Togiak (4), Mumtrak (4), Nelson Island (9), Hooper Bay (9), 
Lower Yukon and delta (3), Pilot Station, lower Yukon (3), Kotlik and Pastolik 
(11) and St Michael Island (8). 

W,—St Lawrence Island (153). 

W,—Little Diomede Island (5), and two sites on the mainland of Asia, Indian 
Point (14) and Puotin (2). 

The means for these three groups are given in Table II and they show a 
remarkably close resemblance for all characters. The crude coefficients of racial 
likeness are: W, and W, —-06+-22 (18),* W, and W, —-49+-23 (17)} and W, 
and W, —-50 + -23 (17)}. As far as can be seen from the data available, the Eskimo 
population of the south-west of Alaska, St Lawrence Island and the Asiatic 
mainland is perfectly homogeneous. Within this area there is no evidence of local 
populations differing significantly from the prevailing type, though it is quite 
possible that there are local variants. There is only one skull from Kodiak Island, 
for example, and it is quite possible that if 50 were available their measurements 
would distinguish the population from that of St Lawrence Island. At the moment 
the pooling of the three groups W,, W, and W, appears to be justified and the 
combined means are those of the Western series in Table IV. It will be shown that 
they are not closely similar to those for any other series available. 

Three groups of Eskimo skulls from the north-west and north of Alaska were 
made up in the following way, these being distinguished from the Western groups 
because their mean measurements clearly differentiate them : 

NW,—Golovnin Bay (3), Cape Nome (1), Sledge Island (5), Port Clarence 
(4), Wales (19), Shishmaref (13), Kotzebue (2). 

NW,—Barrow and vicinity (37). 

NW,—Point Barrow (49). 


The means for these groups, given in Table IT, again show a remarkably close 
resemblance for all characters. The crude coefficients of racial likeness between 
them are: NW, and NW, —-04+-23 (17), NW, and NW, -56+-23 (17) and NW, 
and NW, —-18+-23 (17).+ The pooling of the three groups again appears to be 
fully justified, and accordingly the combined means were computed and they 


* For all the characters in Table IT except GZ and BZ. 
+ For all the characters in Table IT except C, GZ and BZ. 
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are those for the North-Western series in Table IV. It will be shown that they 
are differentiated from those for all the other series available. It is surprising 
to fiad that all the material hitherto considered in this section can be partitioned 
with so little hesitation into two homogeneous series which are distinctly different 
from one another. A crude comparison of the means in Table IT shows that this 


TABLE II 
Mean Measurements of Groups of Male Eskimo Skulls from Alaska* 
Western groups | North-Western groups 
Kotzebue | 
Asiatic and | ¢ 
St Lawrence | _South- Point 
Island Western Diomede Barrow 
(W,)t (W,) Norton vicinity (NW,) 
2 1 (W,) Sound (NW,) 

1462 (142) |1503-1 (53) |1470 (5) | 1448-4 (40) 1324 (5) | 

L 184-0 (153) | 183-3 (55) | 185-1 (21) | 187-3 (47) | 189-0 (37) | 187-4 (49) 
B 141-9 (153) | 142-1 (55) | 143-2 (21) | 136-6 (47) | 137-3 (37) | 138-4 (49) | 
H’ 136-8 (145) | 135-9 (55) | 137-2 (20) | 188-0 (45) | 137-8 (35) | 137-8 (47) | 
LB 103-6 (145) | 103-8 (54) | 104-6 (19) | 106-5 (45) | 106-1 (35) | 105-4 (47) | 
GL 104-3 (131) | 103-7 (43) | 104-4 (14) | 106-0 (39) | 103-9 (21) | 103-9 (36) | 
78-2 (139) | 78-6(47) | 78-3(17) | 77-6(39) | 78-9(21) | 78-6 (37) | 

J 142-0 (148) | 142-1 (52) | 141-9 (21) | 141-8 (42) | 143-4 (26) | 142-6 (44) 

NH 54-2 (148) | 54-4 (54) | 55-0(21) | 54-0(44) | 55-2(29) | 54-8 (46) 

NB 24-5 (148) | 24-2 (54) | 25-0(21) | 23-8 (44) | 23-9(29) | 23-1 (46) 

0; 36-8 (145) | 36-7 (54) | 37-0(21) | 363(44) | 36-0(28) | 36-1 (43) 
0,’ 40-3 (145) | 40-0 (54) | 40-6 (21) | 40-6 (44) | 40-4 (28) | 40-2 (43) | 
100 B/L 77-1 (153) | 77-6 (55) | 77-4(21) | 73-0(47) | 72-6 (37) | 73-9 (49) | 
100 H’/L {74-3 (145)} | {74-1 (55)} | {74-1 (20)} | {73-7 (45)} | {72-9 (35)} | (73-5 (47)} | 
100 B/H’ | {103-7 (145)} | {104-6 (55)} | {104-4 (20)} | {99-0 (45)} | {99-6 (35)} | {100-4 (47)} | 
100 NB/NH | 45:2 (148) | 44:5 (54) | 45-4 (21) | 442 (44) | 43-4 (20) | 42-2 (46) | 
100 0,/0,’ 91-2 (145) | 91-8 (54) | 91-1 (21) | 89-5 (44) | 89-2 (28) | 89-9 (43) | 
NZ {68°-2 (131)} | {67°-6 (43)} | {67°-8 (14)} | {68°-0 (39)} | {66°-3 (21)} | {66°-7 (36)} | 
AZ {67°-u (131)} | {67°-9 (43)} | {68°-2 (14)} | {69°-2 (39)} | {69°-7 (21)} | {69°-2 (26)} 
BZ {44°-3 (131)} | {44°-5 (43)} | {44°-0 (14)} | {42°-8 (39)} | {44°-0 (21)} | {44°-1 (36)} 


* The indices and angles in curled brackets were derived from the means of the chords involved, 
instead of from values for individual skulls. 


+ The locations of the groups are shown on the map in Fig. 1. 


step is entirely reasonable. In the case of L, B, 100 B/Z and 100 B/H’ the means for 
the three Western sub-groups cover a small range and those for the three North- 
Western sub-groups cover another small range, while there is a clear separation 
of the two ranges. For C, H’, LB, NB, O,, 100 H'/L, 100 NB/NH, 100 0,/0; and 
A Z the two ranges are also discrete, but the separation between them is less clear. 
For the remaining characters—GL, G’H, J, NH, 0}, N Z and B Z—the two ranges 
overlap, but the differences are so small that all between pairs of the six groups are 
probably insignificant. The two major groups are thus clearly defined and clearly 
distinguished. It would have been expected from geographical considerations 
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(see Fig. 1) that the sub-group NW, would bear a closer resemblance to the three 
Western sub-groups than NW, and NW, would to these three, but there is no 
suggestion of this from the measurements. There appears in fact to be a distinct 
cleavage between the types of the south-west and north-west coast of Alaska, 
though more abundant material would be needed in order to ascertain with 
precision where the dividing line comes. Some of the series from neighbouring 
sites making up the sub-groups W, and NW, are so small that a few might be 
transferred from one to the other without affecting the position appreciably. 

The means of three other series of male Eskimo skulls from Alaska are given 
by Dr Hrdlitka in his 1928 Report. They fall within the same area as the Western 
and North-Western series dealt with above, but they were kept apart from these 
as their mean measurements evidently differentiate them. The first is made up 
by 46 specimens from Nunivak Island, the second of 131 from Point Hope and 
the third is the series of 27 “Old Igloos” skulls from the vicinity of Point Barrow. 
This last is believed to represent an earlier population than all the others, and it 
was kept separate on this account, and also because the type it represents is 
clearly distinguished from all the others determined by Alaskan series. A number 
of small groups from the islands west of Greenland and the Canadian mainland 
were pooled to form what will be called the Central Eskimo series. They are: 
Northern Arctic (5), Melville Peninsula (1), Southampton Island (9), Hudson Bay 
and Ungava Bay (5), Baffin Land, northern Devon, and vicinity (16) and Smith 
Sound (7). 

The only remaining series measured by Dr Hrdlitéka is the one cf 49 skulls 
from unknown localities in Greenland. This may be compared with Fiirst and 
Hansen’s Greenland Eskimo series. These writers conclude that: ‘The anthro- 
pological characters cannot contribute to a solution of the question of the migra- 
tion of the Eskimos [in Greenland], owing to the fact that the homogeneity of 
their anthropological characters clearly shows that the Eskimos of both the west 
and the east coasts are of the same racial type.” This conclusion is based prin- 
cipally on a comparison of the distributions of the measurements for wnsexed 
series of crania from different regions of Greenland. Male means computed for 
three groups into which the total material is divided are given in Table LIT below 
in the case of nine of the more important characters. In asking whether the 
differences between these are significant or not, the standard deviations of 
Hrdlitka’s Greenland Eskimo series (given in Table I) were used, and it has been 
noticed that these are appreciably less than the values for the total series measured 
by Fiirst and Hansen. The only differences greater than three times their probable 
errors are: C, Eastern less than Western (A/(p.e. A) = 5-9) and South-Western 
(6-3); H’, Western less than South-Western (3-3); J, Eastern less than South- 
Western (3-8); NH, Wesi ~ ss than South-Western (4-0); 100 VB/NH, South 
Western less than Western (3-3). No importance can be attached to any of these 
differences except those for the capacity. The types appear to differ principally 
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in size, and it may be noted that the South-Western series has the largest means 
in the case of all the absolute measurements except L. 


TABLE III 
Mean Measurements for Male Series of Greenland Eskimo Skulls 

South- Unknown 

Western Eastern Pooled 

“Fiirst and Hansen Hrdlitka 

C 1536-2 (95) | 1549-0 (48) | 1465-8 (32) | 1526-8 +6-6 (175)*|1518 +9-0 (42)+ 
L 188-6 (100) | 188-5 (54) | 187-9(37) | 188-4 +-28 (191) | 189-7+-42 (49) 
B 134-5 (100) 135-1 (55) 133-4 (36) 134-5 +-22 (191) 136-1 + -42 (49) 
H’ 137-6 (97) 139-1 (51) 138-7 (35) 138-2 +-24 (183) 139-5 +-38 (49) 
G’H 74:3 (99) 75-6 (56) 75-0 (36) 74:8 +-21 (191) 76-1+-39 (46) 
J 139-4 (96) | 141-1 (53) | 137-9 (36) | 139-6 +-32 (185) | 140-5+-56 (47) 
NH 53-4 (99) | 54-6(56) | 53-5 (37) | 53-754-15 (192) | 52-44-26 (48) 
100 B/L 71-3 (100) 71-6 (54) 70-9 (36) 71-3 +-15 (190) 71-8 +-30 (49) 
100 NB/NH 43-6 (98) 42-2 (56) 42-9 (37) 43-1 +-19 (191) 44-3 +-36 (48) 


* Some of the means in this column differ slightly from the values given in the paper in the 
Annals of Eugenics cited, as the latter were found from distributions instead of by direct addition. 


+ The probable errors in this column were found by using the standard deviations (given in 
Table I) for 40 skulls of the same series. 


The Eskimo population of Greenland may not be quite as racially homo- 
geneous as Fiirst and Hansen supposed, but for practical purposes there can be 
little harm in combining the three series to form a single sample representing it. 
Some of the pooled means are given in Table III and they may be compared with 
those for Hrdlitka’s Greenland series in the last column of the table. It must be 
remembered in this case that comparison is being made between measurements 
taken by different people. Differences greater than three times their probable 
errors are only found in the case of B (A/(p.e. A) =3-4) and NAH (4-5). There is 
reason to believe that the latter difference is occasioned by the fact that the nasal 
height was not determined in precisely the same way for the two series. Hrdlitka’s 
means for all the absolute measurements in the table, except C and NH, are 
greater than those for Fiirst and Hansen’s sample. As the facial height (@’H) is 
greater it would be anticipated that the nasal height would also be greater, but 
actually it is significantly the lesser. As the major calvarial chords (LZ, B and H’) 
are greater for Hrdlitka’s than for Fiirst and Hansen’s series, the cranial capacity 
(C) would also be expected to be greater for the former, but the difference 
observed is of the opposite sign, though not significant. The divergences observed 
thus appear to be partly due to slight differences in the technique of measurement, 
and the size difference may be partly due to a difference in the process of sexing the 
skulls. In spite of these blemishes the two series of means are in close agreement. 
The coefficient of racial likeness between them can be computed for 16 characters 
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and a crude value of -94 + -24 is obtained. The highest « is for NH (8-2) and the next 
highest for B (5-0). It is at least possible that the coefficient differs significantly 
from zero merely on account of the personal equations of the measurers. 
Under the circumstances, it was felt that it would be best to use the pooled 
means of the two series of Greenland Eskimo skulls for comparative purposes. 
These are given in column 3 of Table [V and the other means there are for the six 
series derived from Hrdlitka’s data, in ways described above, and finally adopted 
for purposes of comparing different Eskimo types with one another and with non- 
Eskimo types. Before treating all the Greenland Eskimo skulls as a single sample, 
however, it was thought advisable to make comparisons between the two sub- 
samples and the six series relating to Eskimo populations outside Greenland. 
Crude and reduced coefficients of racial likeness* between these six and the 
Greenland series measured by (a) Hrdlitka, (b) Fiirst and Hansen, and (c) Hrdlitka 
and Fiirst and Hansen (pooled) are given in Table V. All these coefficients differ 
from zero with marked significance. The first of the three series is by far the 
smallest and it gives the lowest crude values in all cases and values markedly 
lower than the others in five of the six comparisons. Series (b) gives intermediate 
values of the crude coefficients in the case of four out of the six triads, and values 
only slightly in excess of those for series (c) in the other two cases. Corresponding 
reduced coefficients show a much closer approach to equality, but for five of the 
six triads the lowest values are with series a, while for all six the pooled series 
gives intermediate values. This last relation suggests that the process of reducing 
the coefficients is effective, as it appears to give a measure of resemblance in- 
dependent of the sizes of the samples. If the Greenland series measured by 
Hrdlitka showed the lowest reduced coefficient in the case of comparisons with all 
six of the other series measured by him, this might be attributed to a difference 
between his technique of measurement and that employed by Fiirst and Hansen. 
But there is one exception, in the comparisons with the “Oia [gloos” series, and 
another explanation of the results obtained may be suggested. Nothing is known 
about the origin of the Greenland skulls measured by Hrdlitka. If they did not 
form a true random sample from the total population of the country, but one 
biased in such a way that it bears a slightly closer resemblance to modern Western 
Eskimo types than this total population considered as a whole does, then its 
slightly lower reduced coefficients with five of the six series would be expected. 
It is shown below that these five are closely related to one another (see Fig. 1), and 
that the Greenland type does not belong to the same group though it is attached 
to it. The ‘Old Igloos”’ series also diverges from the Western group in the same 
direction as, but to a greater extent than, the Greenland series. On the hypothesis 
considered, Hrdlitka’s Eskimo series would thus be expected to be rather farther 
removed from the “Old Igloos”’ series than Fiirst and Hansen’s Greenland series 
—supposed to be a random sample from the total population of the country—is 


* These coefficients are defined on pp. 100-102 of this volume of Biometrika. 
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from the ‘‘Old Igloos”’ series. The reduced coefficients show these relationships, 
but it is clear that the hypothesis cannot be justified rigorously. Only the pooled 
Greenland series was used in later comparisons. 

Seven series were thus finally adopted: the means for these are given in 
Table IV and the reduced coefficients of racial likeness between them in Table VI. 
Fig. 1 shows the localities from which the material was obtained and the con- 
nections provided by the reduced coefficients less than 10. Three of the Alaskan 
and the Central series are all closely connected with one another. The Western 
diverges from this central group in one direction and the Greenland series diverges 
from it in a different direction. So far there is a general agreement between the 


REDUCED COEFFICIENTS OF RACIAL LIKENESS 
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THE FIGURES IN BRACKETS ARE THE MEAN CEPHALIC INDICES 
FIG1. THE PLACES OF ORIGIN & RELATIONSHIP OF MALE SERIES OF ESKIMO SKULLS. 


relationships found and the geographical positions of the populations represented, 
but this is not maintained in detail. The Greenland series, for example, is nearest 
geographically to the Central, but it bears a closer resemblance to the North- 
Western Alaskan than to the Central series. The extremely close resemblance of 
the Nunivak Island and Central Eskimo types is again unexpected. The only 
coefficient which differs from zero by less than three times its probable error is 
found in this case, but the two series show distinctly different relationships when 
compared with the others. It has been found in the case of other material that the 
fact that two series cannot be clearly differentiated when compared directly does 
not preclude the possibility that they will be distinguished by other comparisons. 
The relationships of the “Old Igloos” series have not been considered yet. This 
was obtained from a site in Alaska within a few miles of some of the others, but it 
differs from all the other Eskimo series in being older than they are, and it is also 
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the smallest series used. The type shows no affinities to any of the modern ones 
found in the same area and it only shows one close connection, which is with the 
Greenland series. 

The material available shows that there are a number of existing Eskimo 
populations of types which can be clearly distinguished from one another, and 
there is evidence of one type which appears to be extinct to-day. The fact that no 
close correlation is found between geographical position and resemblance as 
measured by the reduced coefficient of racial likeness may be due to migrations 
of the peoples represented. Evidence of other extinct Eskimo populations will 
probably be needed in order to throw light on the origin of the present-day 
varieties, but comparison with non-Eskimo material should also aid a solution of 
this problem. 


6. Comparisons of Single Characters. The means of the seven Eskimo series 
finally adopted for purposes of racial comparison are given in Table IV. They 
relate to 20 characters and the coefficients of racial likeness were computed for 
18 of these. In the process of computation a value, «, is obtained for each cha- 
racter in each comparison; this is approximately the square of the ratio of the 
difference between two means to its standard error. For the seven series there are 
7 x 6/2 = 21 comparisons for each character, except in the case of the capacity (C), 
for which the total is 15, as one mean is missing. We may decide, quite arbitrarily, 
to consider that an « indicates a significant difference if it is greater than 10. If 
samples were drawn from two populations which actually had identical means 
for a particular character, then an « greater than 10 would only be expected to 
occur once in about 625 trials. The numbers of «’s greater than 10 in the 21 
comparisons will give estimates of the relative degrees to which the coefficients 
are determined by different characters. As is usually found in such comparisons, 
there are marked distinctions between the characters when examined in this way, 
some being practically constant for all the series and others showing significant 
differences between most pairs of them. 

The three orbital measurements and the nasal angle show no «’s at all greater 
than 10. The nasal height is almost as constant, as it only shows one value greater 
than the limit chosen and this is only slightly above the limit: the « is 11-30 for 
the Greenland and North-Western series. A second group of characters may be 
distinguished by the fact that they only show significant differences in some of the 
comparisons between the Westerr series on the one hand, and the other six series 
on the other. These are the nasal index (3 «’s> 10), the basio-bregmatic height 
(3 «’s > 10) and the chord from nasion to basion (LB: 4 «’s > 10). It can be seen 
from Table IV that the Western series has the highest mean for 100 NB/NH and 
the lowest means for H’ and LB. Two other characters may be added to this 
group. The alveolar angle (A 2) show 7 of the 21 «’s greater than 10, and 5 of these 
—including the only two «’s indicating markedly significant differences—are for 
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comparisons between the Western and the other series: the Western has the 
lowest mean. The nasal breadth shows 5 «’s greater than 10, and 4 of these are for 
comparisons between the Western and the other series: the Western has the 
highest mean. This disposes of 10 of the 18 characters: 5 of these 10 may be 
supposed practically constant for all the Eskimo types, and the other 5 cha- 
racters are practically constant for 6 series, but they distinguish these 6 from the 
Western Eskimo series. 

The remaining 8 characters serve other purposes. The bizygomatic breadth 
(J) only shows 5 ofits 21 «’s significant, though all these indicate clear divergence, 
and they are all for comparisons between the Greenland Eskimo and the other 
series: the Greenland mean is not distinguished from the “Old Igloos”’, but it is 
from all the others. The height-length index again distinguishes one series from 
all the others, but in this case it is the Point Hope: all its «’s for the character 
indicate clear significance, and there is only one other « greater than 10 (viz. 10-52). 
The significant differences are more erratic in the case of the upper facial height 
(G’H: 6 «’s out of 21 greater than 10) and of the capacity (C: 5 «’s out of 15 
greater than 10). The remaining 4 characters are distinguished from all the others 
by the fact that they show more significant than insignificant differences. In 
each one of these cases there are 21 comparisons and the numbers of «’s greater 
than 10 are 13 for L, 14 for B, 15 for 100 B/H’ and 17 for 100 B/L. 

In the comparison of a group of series representing closely related populations, 
itis commonly found that the major calvarial chords and the indices derived from 
them show a larger percentage of significant differences than any other characters, 
and the cephalic index generally distinguishes the types more effectively, on the 
average, than the calvarial length or breadth from which it is derived. In Table IV 
the series are arranged in order of their mean cephalic indices: L, B and 100 B/H’ 
give very similar orders to this, but the same is not true for any other character. 
By considering the characters singly and then attempting to combine the 
evidence of each, it does not seem to be possible to construct any clear picture of 
the situation. Different characters suggest different conclusions and the advan- 
tage of using a generalized criterion, such as the coefficient of racial likeness, is 
evident. 

The coefficients (Fig. 1) suggest that four of the Eskimo series represent 
populations which are all closely related to one another. The Greenland Eskimos 
diverge from this central group in one direction, and the type known from the 
“Old Igloos”’ skulls diverges in the same direction, but to a greater extent: the 
Western Eskimos diverge from the central group in another direction. This 
arrangement is suggested by the calvarial breadth (B) and two indices (100 B/L 
and 100 B/H’) which involve this measurement. The same arrangement is not 
suggested by any other character for which means are given in Table IV, but it is 
by another index involving P, viz. the “cranio-facial” (100.J/B). Means for this, 
computed not from individual measurements but from the means of the two 
Biometrika 2 
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chords, are: “Old Igloos” 106-5 (26), Greenland 103-7 (227), North-Western 
103-6 (112), Central 102-4 (42), Nunivak Island 101-6 (45), Point Hope 103-2 (124) 
and Western 99-9 (221). Dr von Bonin has given* means of this index for 49 male 
cranial series and the highest value in his list is 104-9 for Loyalty Islanders. 

The mean calvarial length of the “Old Igloos”’ series (192-5) is almost as great 
as the largest recorded for any male series of skulls; the nasal breadth of 23-0 
(Greenland and Central series) is very close to the extreme found for all races, 
and the nasal index of 42-7 (Central) appears to be the lowest as yet recorded. 
These, however, are not characters which distinguish the “Old Igloos” and 
Greenland from all the other Eskimo types. 

It may be noted that the Eskimo skull also appears to be quite extreme among 
modern races of man in having the “flattest”’ facial skeleton, though its nasal 
bridge is not peculiarly flat.+ It has also been shown that its malar bones are 
extremely large and that an index expressing their vertical arcs as percentages of 
their horizontal arcs makes a clear distinction between the Eskimo and all other 
races for which data are available.{ These measurements have only been given 
for a single series of Eskimo skulls—viz. one made up principally by specimens 
from Greenland—and their means for the series measured by Dr Hrdli¢ka should 
be of particular interest. Other features which cannot be estimated from any 
measurements available, such as the median sagittal crest, also demonstrate that 
the Eskimo type is peculiarly specialized. 


7. Comparisons between Eskimo and Asiatic Series. As a preliminary to any 
discussion of the “origin” of the Eskimo, it is clear that comparisons must be 
made between the different varieties found and series representing other races. 
The type is certainly one of the most specialized known, and the fact that several 
distinct varieties of it are found should aid the solution of problems concerning 
its relationships. In spite of the striking resemblance of the Chancelade to modern 
Eskimo skulls, there is no race known to have existed in Europe since palaeolithic 
times which is closely similar to that of the northern people. It is to be expected, 
however, that close affinities will be found with certain American and Asiatic 
races. It is hoped that the results of statistical comparisons between the Eskimo 
and North American cranial material will be presented later,§ and only com- 
parisons with Asiatic material are considered here. 

Coefficients of racial likeness for all pairs of 26 male Asiatic series have been 
published. || Their full comparison by the same method with the seven Eskimo 


* Biometrika, xxvii (1936), p. 133. 

+ See T. L. Woo and G. M. Morant, ‘“‘A Biometric Study of the ‘Flatness’ of the Facial Skeleton 
in Man”, ibid. xxvr (1934), pp. 196-250. 

{ See T. L. Woo, “A Biometric Study of the Human Malar Bone”, ibid. xxrx (1937), pp. 113-23. 

§ In a paper by Dr von Bonin and the writer which is nearly completed. 

|| T. L. Woo and G. M. Morant, “A Preliminary Classification of Asiatic Races based on Cranial 


Measurements”, Biometrika, xx1v (1932), pp. 108-34. The Tibetan B series of 15 skulls was omitted 
because it is too short for the purpose in view. 
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series would necessitate the computation of 182 coefficients. But it has been 
decided that fur purposes of classification the lower orders of reduced coefficients 
only should be used, and it is to be anticipated that the vast majority of these 182 
will be of higher orders which can be neglected. The method described below makes 
it possible to decide, from a comparison of the means for a few characters, whether 
the sets of means for two series may possibly give a reduced coefficient less than 
a particular value (19), or whether it will be safe to assume that the coefficient 
will be greater than this limit. If the simple test indicates the second of these 
conclusions, then there is no need to calculate the coefficient, as it will not be 
needed in the classification. The method has been used in the comparison of other 
groups of series, and it makes it unnecessary to carry out a large amount of 
computation. 

In classifying the 26 Asiatic series, all reduced coefficients less than 19 were 
neglected. It was found for the total 325 (= 26 x 25/2) comparisons that the 
calvarial length, breadth and height and the three indices derived from these 
chords gave numbers of significant «’s larger than, or almost as large as, the 
numbers given by any other of the 31 characters used. The values of the coefficients 
are evidently determined in large part by these six measurements, though others 
also play important réles. The maximum differences between the means found in 
the case of the 54 comparisons giving reduced coefficients less than 19 are: 


L B H’ 100 B/E 100H’/L 100 B/H’ 
6-7 mm. 6-1 mm. mm. 54 3-4 6-5 


These values are much less, of course, than the corresponding maximum differ- 
ences which would be found in the case of all possible comparisons between pairs 
of the 26 Asiatic series. If any one of these series could be compared with a new 
Asiatic series, and if any one of the differences of the means for the six characters 
were found to be greater than the value for the character given above, then it is 
unlikely that the reduced coefficient found in this case would be less than 19. 
Under the same circumstances, it is still less likely that one of the Asiatic and a 
non-Asiatic series would give a reduced coefficient less than 19. These considera- 
tions can be used to select those pairs of series in new comparisons which will be 
the only ones likely to provide reduced coefficients less than the limit which has 
been arbitrarily chosen. The ranges of the differences actually used for this 
purpose were those above with the addition of -1 to each, viz. J 6-8 mm., B6-2mm., 
H’ 6-4 mm., 100 B/L 5-5, 100 H’/L 3-5 and 100 B/H’ 6-6. 

Comparisons of means restricted to these six characters were first made 
between each of the seven Eskimo series, on the one hand, and each of the 26 
Asiatic series, on the other. If for a particular pair the difference found between 
the means was found to be in excess of the limit fixed in the case of any one or 
more of the characters then no calculation was carried out, as it may be presumed 
that all such pairs would give reduced coefficients of racial likeness greater than 
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19. This speedy test made it unnecessary to calculate 161 of the total 182 coeifi- 
cients possible. The remaining 21* might give connections of the order finally 
considered, but the majority of them would stil! be expected to be of a higher 
order. In each of these cases the «’s were first calculated for characters which 
showed most significant differences, and it was generally possible to see from a 
few of these that the reduced coefficient must exceed 19. It was only necessary 
to calculate two coefficients in full, and one of these was found to exceed 19 
while the other is: ; 


Western Eskimo (”% = 220-8) and Chukchi (34-1)—-reduced coefficient = 7-05 + -45 
for 13 characters. 


In this comparison the nasal index («= 14-0) is the only character which gives an 
« greater than 10. The Chukchi series (measured by Fridolin}) represents a 
people, inhabiting the extreme north-east of Asia, generaily supposed to have 
close physical affinities to the Eskimos. It only showed one reduced coefficient 
less than 19 with the other Asiatic series, viz. that of 18-27 +-65 with the Pre- 
historic Chinese. The modern Chinese can thus be linked to the Greenland 
Eskimo type by a number of intermediate types, the sequence being: Modern 
Chinese—Prehistoric Chinese—Chukchi—Western Eskimo—Central Eskimo— 
Greenland Eskimo. 


* The test also allows four comparisons between the Tibetan B and the Eskimo series, but ail 
the reduced coefficients were found to be greater than 19. 

+ The calvarial height given for the Chukchi skulls is the vertical from the basion (H) in place of 
the more usual basio-bregmatic (H’). One mm. was subtracted from the mean H to give an approxi- 
mation to H’, as average differences very close to this have been found for all the longest series 
for which both heights have been given. 
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1. Introduction. The validity of Fisher’s z-test in the practical situations in 
which it has been applied has been the subject of much discussion.* In general, 
the mathematical distribution of z follows from the assumption that the sample 
observations 2; (¢= 1, 2, ..., n) can be written 

where the c’s are known numbers, the @’s are r< unknown parameters, and the 
y's are normally and independently distributed about zero with standard 
deviations proportional to known numbers. Results following from such a 
starting-point may be termed results from normal theory. In practice we may 
not wish to make all the above assumptions regarding the 7’s, and to a certain 
extent it can be shown that, not doing so, we can still use the tests based upon 
them. Of especial interest are the cases of experimentation into which randomiza- 
tion enters as part of the structure. R. A. Fisher has pointed out} that, in any 
such case, it is possible to carry through arithmetical calculations, from which 
the hypothesis under test may be judged, without making any assumptions 
whatever. These calculations are lengthy. One can, however, consider only certain 


* For a bibliography of the subject see a paper by T. Eden and F. Yates entitled “On the 
Validity of Fisher’s z-test when Applied to en Actual Example of Non-Normal Data”, J. agric. 
Sci. xx (1933), pp. 6-16. Other references are given later. 

+ See, for instance, The Design of Experiments, Oliver and Boyd (1935), p. 51. 
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aspects of them, which are sufficient to give useful comparisons with the results 
from normal theory. This I have done in the present paper. 


2. Randomized Blocks. The first situation to be discussed will be that of 
Randomized Blocks. Here the treatments under test are represented once and 
once only in each of a number of blocks. The yields given by an experiment may 
be denoted by x,y), where i(=1,2,...,”) are the blocks and k(=1, 2,...,8) the 
treatments. The usual procedure employed to test whether the treatments can 
be regarded as equivalent is to perform on the yields the analysis of Table I and 


TABLE I 
Analysis of Variance for Randomized Blocks 
8 
Source Degrees of Freedom S cubes Mean Square 
Between Treatments fi=(s—1) v,=8,/f, 
Between Blocks fe=(n—1) S, 
Residual fo=(n—1) (s—1) So Vp=So/fo 
Total (ns—1) 8 


to calculate the criterion z= }log,(v,/v)). This criterion is then referred to a 
certain theoretical distribution, namely the distribution of z obtained by assuming 
where the A’s are unknown block means and the 7’s are all normally and in- 
dependently distributed about zero, with the same unknown standard deviation. 
To investigate the meaning and extent of these assumptions it is necessary to 
consider further details of the experimental arrangement, and the exact manner 
in which the hypothesis, that the treatments are equivalent, is usually formulated. 
For convenience let the plots in each block be numbered j= 1, 2, ...,s. Then the 
yield which the kth treatment would give, if applied under the experimental 
conditions, on the jth plot of the ith block may be denoted by 2,4). For any plot 
(i, 7), of course, only one of the quantities x;,,) is real, viz. the one for the treat- 
ment & which is actually used on that plot in the experiment. The other 2;,.,) 
are hypothetical, based on the conception of what might happen if the experiment 
could be repeated under the same essential conditions, using in turn every 
treatment on the plot (7, 7). The hypothesis that the experiment is to test must, 
for statistical purposes, be expressed as a relation holding in some hypothetical 
population. In this case the population consists oi the values 2;,.* In the 


* For a further discussion of this manner of defining our statistical population see the con- 
cluding section of the paper. 
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literature the hypothesis has been formulated in terms of these 2’s, in two distinct 
ways. 

(i) R. A. Fisher has considered the hypothesis that the treatments would 
give equivalent results on every individual plot, i.e. that x4, would be the same 
for all k. 

(ii) J. Neyman has suggested that we should allow the possibility that the 
treatments would affect individual plots differentially, and should consider the 
hypothesis that the average yield of each treatment, if applied over the whole 
experimental field, would be the same. This means that x..,, would be the same 
for all k, where x..q) is the mean of 2,4) over all (i, 7). 

The first of these “null”’* hypotheses is the one which will be considered here. 
We may express it by the following equation: 

The other “null” hypothesis was discussed in a paper given recently by Dr 
Neyman to the Royal Statistical Society,} in relation to the same problem that 
I am concerned with here. He, too, investigated the influence on the z-test, of the 
fact that the assumptions of normality and independence in equation (2) are not 
exactly satisfied. He came to certain conclusions and expressed the opinion that 
further investigation was desirable. The results that I obtain in the present paper 
may, I think, profitably be compared with his. I should emphasize, however, 
that the “null”? hypothesis which I am using is that of equation (3) and that the 
situations are therefore not exactly the same. However, as Dr Neyman points 
out in commenting on the discussion after his paper, his results are applicable to 
the “‘null”’ hypothesis of (3), if some of the quantities in his equations are given 
certain values. On the other hand the methods I adopt here are not applicable 
to his more general “‘null’’ hypothesis. 

We must now refer to the essential point of the arrangement of Randomized 
Block experiments. This is as follows. In every block the s treatments are 
assigned entirely at random to the s plots available for them. This means that, 
if the hypothesis of equivalent treatments is true (i.e. if (3) is satisfied), the yields 
Xqy (k=1, 2,...,8), given by the experiment will be a random arrangement of 
x4; (j= 1, 2, ...,8). For instance, in Fig. 1 (a) is given a possible set of yields x,; for 
a field consisting of four blocks with three plots each. Fig. 1 (6) shows one possible 
way in which the treatments may be arranged on this field. In the first block, 
treatments 1, 2 and 3 are on plots 2, 1 and 3, respectively; this is only one of 3! 
possible arrangements. Similarly, in the other three blocks we have illustrated 
one of the 3! possible arrangements. Hence, taken as a whole, Fig. 1 (b) represents 
one of (3!)* possible arrangements. 


* The term “null hypothesis” is used in the literature to denote the hypothesis that the 
treatments are equivalent. 

+ “Statistical Problems in Agricultural Experimentation”, J. Roy. statist. Soc. Suppl. 1 
No. 2 (1935), pp. 107-180. 


t= 
i 
= 
3 
4 


24 On the z-Test in Randomized Blocks and Latin Squares 


The method of randomization gives all the possible arrangements an equal 
chance of occurring. Corresponding to each arrangement there will be a different 
reclassification of the yields by block and treatment. Fig. 1 (c), for instance, shows 
the reclassification of the yields of Fig. 1 (a) when the arrangement Fig. 1 (6) is 
applied to the field. From the reclassified yields the analysis of variance of 
Table I can be worked out and the value of z computed. The (3!)* arrangements 
will each lead to a value of z, and the z of the experiment may be regarded as 
randomly selected from this distribution of values. The question whether the 
theoretical z-distribution of normal theory gives a valid test of the hypothesis of 
treatment equivalence, therefore, involves a comparison of the theoretical 
distribution of z with the distribution which would be obtained by taking all the 
possible arrangements.* 


Numbers of Numbers of | 
| Plots (7) Plots (7) Treatments 
S 1 16 12 10 % 1 12 16 10 | 
31 37 40 4 2 m 4 40 31 37 | 
Fig. 1 (a). Example of pos- Fig. 1(6). Possible ar- Fig. 1 (c). Reclassification of 
sible yields x;; on 4 block by rangement of treatments yields 2,4) obtained by apply- 
3 plot field. k (=1, 2, 3) on the field ing the arrangement Fig. 1 (5) 
of Fig. 1 (a). to the field Fig. 1 (a). 


3. Normal Theory and Randomization Compared. One approach to the com- 
parison of the z-distribution from normal theory with that from randomization 
is to take separately the mean squares v, and v, of which zis a function. In normal 
theory v) and v, are independently distributed and their mean values are both 
o*, where o is the standard deviation of a single y in equation (2). From random- 
ization it is found that v) and v, have equal expectations. They are not, however, 
independent, since the sum (S,)+ S8,) is constant, being the total sum of squares 
within blocks and therefore not dependent on the manner in which the treatments 
are assigned within blocks. The parallelism between the two theories also breaks 
down if we consider the variances of vp and v,. 

In the normal theory vy and v, are distributed as (x207)/f, and (x?o7)/f, respec- 
tively, where x and x? are independent y*’s with f, and f, degrees of freedom. 
The variances of v) and v, are therefore 2c4/(n—1)(s—1) and 2o04/(s—1), ice. 
are in the ratio of 1:(m—1). From randomization, since (S, + 8,) is constant, the 
variance of Sy must be the same as that of S,. v) and v, therefore have variances 

* For brevity, any frequency constants calculated over all the possible random arrangements 


will be termed constants calculated from randomization. It should be emphasized that we re 
only considering what happens if the hypothesis of equation (3) is really true. 
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proportional to 1/f2 and 1/f?, ie. variances in the ratio 1:(n—1)*. With such 
discrepancies between results from normal theory and from randomization, it 
would be unsafe to conclude, from the circumstance that the expectations of v, 
and v, are in agreement, that the z-distributions will also compare favourably. 
It is advisable to try to obtain some results for the z-distribution directly,* and 
this I have done by means of a simple transformation. 

It is of interest to note here a sampling experiment which was designed to 
compare directly the z-distribution from randomization with that from normal 
theory, for one particular set of data. This practical investigation, carried out 
by T. Eden and F. Yates,} did show close agreement between the z-distributions 
obtained in the two ways. We shall return to their example later. 

Instead of z, we can equally well use the function of z: 


which increases monotonically with z. Instead of saying that we reject the 
hypothesis of equivalent treatments when z > 2, (say), we shall now say that we 
reject when U > U,, where U, is related to z) by equation (4). If the U distribu- 
tions from normal theory and from randomization compare favourably, then 
necessarily the z-distributions will do so also. The convenience of U lies in the 
fact that in the randomization procedure (S, + S,) is constant, and thus only the 
variation of S, need be considered. 

The comparison of the U distributions will be made through the medium of 
their first two moments. In the normal theory we have 

xi)’ 

where x? and x? are independently distributed as y? with degrees of freedom 


fo=(n—1)(s—1) and f, =(s— 1), respectively. It follows that the distribution of 
U is 


The moments of this distribution are 
fothi 
= fi (fit 2) s+l (6) 
(fot Si) (fot fit 2) n(ns—n-+t 2)’ 
2(n—1) 
2 = 


* Dr Neyman in the discussion after his paper already referred to, pointed out this advisability 


of considering the z-distribution directly, when any investigation of the validity of the z-test is 
being made. 


+ “On the Validity of Fisher's z-test”, loc. cit. 
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The suffix N is here used to denote that the moments are calculated from the 
normal theory. The suffix R will be used for the moments of U from randomization. 
The calculation of these moments is made easier by denoting the means of the 


yields x;; over the blocks by B;, and the deviations from these means by 4,;. 
Thus 


where > u;; is zero. The experimental yields x,y, will then be written 
j 
(9) 


Using the dot notation to indicate that a mean is being taken over all values 
of the letter replaced by the dot, we have 


22 — Xx)? = 22 


since x.) must equal B;. Also, since (k= 1, 2, ...,8) are a random arrangement 
of x,; (j= 1, 2, ...,8), must be a random arrangement of w,;;. Hence 


(So+ S,)=>> (10) 
1 
5 


Since > u,; is zero, the expectation of any y,,) is zero. Further, since the arrange- 
j 


ments in different blocks are made independently, the y’s in different blocks are 
independent. Hence, using H to denote expectation, we have 


1 
(11) 
E(S,) 1 


which is the same as the mean from normal theory. For the variance of Up, first 
consider 1 
Si= (> 


This summation is taken for k ard k’ over 1, 2, ..., 8, and for 7, m, p, and q over 
1, 2,..., 2. There are thus n4s? terms, but not all of these contribute to the expecta- 
* Single summations are over all values of the letter indicated. © indicates a summation 
over all values of i and m excluding i=m, i.e. the summation includes both y,q)y¥q. and 
Yow) Yig» the fact that these terms are the same being ignored. This convention is used throughout. 
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tion of n?S}. For if, in any term, any one of the subscripts i, m, p and q is not 
equal to some one of the other three, the expectation of that term must be zero. 
Hence 

== [22 103 + 2 2 Yani) Y 
(2 + 28 2. E 
+ 28 1) 2 E (13) 


k and x’ in the last term standing for any two different treatments. 
Now E (> u3,)/s, and for k+k’ we have 
j 


~ 8(s—1)" 


E [yn = 
Hence (13) gives 


(s—1) 
Hence from (10), (14) 
(15) 


Since the mean U; is 1/n, we get from (14), 


n®(s—1) 

As the mean U is the same from normal theory and from randomization, a 
comparison of the two U distributions can be made, to a first approximation, from 
the variances of equations (7) and (16). In this discussion attention will be 
focussed on what has been termed by Neyman and Pearson the first kind of error, 
i.e. the risk of rejecting the hypothesis that the treatments are equivalent when 
it is actually true, as distinct from the second kind of error which occurs when we 
fail to detect differentiation where it really exists. Suppose we wish the risk of the 
first kind of error to be «. Let U, be the value from normal theory such that 
P (Ux > U,)=«. Then the rule adopted to test whether the experimental results 
are consistent with there being no real differences in treatments, is to reject this 
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hypothesis if the U of the experiment is greater than U,. The real chance of the 
first kind of error should not, however, be drawn from the Uy distribution but 
from the U, distribution. Therefore we need to know how P (Up >U,) compares 
with «. Roughly, we may say that if oy, <oy, the chance of U; > U) will be less 
than ¢: on the other hand if oy, > oy, the chance of U;, > U, will be greater than e. 

From equation (16) it is seen that apart from n and s, the comparison of oy , 
and o,,, involves only the single function A of the plot yields. This function 
depends on the relative sizes of (> u?;) in the different blocks, i.e. on the relative 


sizes of the observed variances within the blocks. Now the minimum value A can 
have is when (> u?,) is the same for each block. A is then equal to 1/n. Further, 


since each (% u?;) is essentially positive, it is seen that the maximum value A can 


have is when (> u?,) is zero in every block, except one. A is then equal to unity. 
j 


Hence from (16) o7,, must lie between aa and 0. Comparing the maximum 
2(n—1) 
nF 
that, ifn (s — 1) is not too small, oy, will never be much greater than o,,, and hence 
the chance of rejecting will never much exceed the specified «. We are therefore 
not likely to err much on the side of overestimating the significance of observed 
treatment differences. On the other hand if there is too much discrepancy between 
the variances within the different blocks, cy, may be considerably less than oy, 
and the test may seriously underestimate significance. The question must now be 

asked: how much discrepancy in block variations will be serious? 
Our procedure will be to approximate to the U,-distribution by means of a 
Type I Pearson-Curve with limits at 0 and 1, i.e. by a curve 


p(Up)=const. x (17) 


possible value, with the value - of of, in equation (7), it appears 


m, and m, will be chosen so that the first two moments of this curve agree with the 
true moments of U, given by equations (12) and (14). Of course, the distribution 
of U from randomization must in fact be discontinuous and although U must lie 
between 0 and 1, these extreme values will never in general be attained. However, 
although (17) may for these reasons only provide an approximate graduation to 
the U;, distribution, we may certainly expect it to be better than the normal theory 
curve of (5), which is of the same form but does not have the correct standard 
deviation. Certainly for the normal theory to be satisfactory we may demand 
good correspondence between (5) and (17). 


4. A Particular Example (Randomized Blocks). In the following example 
there are n = 8 blocks and s = 4 treatments. This is the case for which T. Eden and 
F. Yates performed the sampling experiment referred to earlier. f, is 21, /, is 3 
and equation (5) gives 

p (Uy) = const. x (1 -- 
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From the Tables of the Incomplete Beta-function* or by transformation from 
Fisher’s z-tables, we find 

P (Uy >+305)=-05; P(Uy>-410)=-01. 
That is to say, -305 and -410 are the values of U, corresponding to normal theory 
significance levels of «=-05 and «=-01 respectively. Now the m, and m, of the 


general type I curve, (17), are connected with its first two moments by the 
relations 


\ 
— #1") (#2 — #41") 
Also in the present case the randomization moments from equations (12) and 


(14) are 

-7(1942A 

Hence m _ (19424) 9+ ) 


1=16(1— A) 16(1—A) 

For every A in the possible range from } to 1 there will be a different approxima- 
tion (17) to the distribution of U from randomization. For each A, then, we can 
derive, from the Incomplete Beta-function tables, an approximation to P(U, > U)) 
—the true chance of the first kind of error. This has been done and the results are 
plotted in Fig. 2. It will first be noticed that the risk of the first kind of error never 
exceeds by much the value « at which we attempt to fix it. The maximum value : 
occurs at A =-125, where for « =-05 the risk is -056 and for «=-01 it is -013. The 
risk decreases as A increases, until at A =-192 it is actually «. It decreases further 
from ¢ to 0 as A increases from -192 to 1. For values of A within this range the 
test will tend to underestimate significance. 

The data upon which T. Eden and F. Yates performed their sampling experi- 
ment were derived from measurements of heights of barley. The eight values of 
>» vi; (¢=1, 2,...,8) for these data are 7628, 15,702, 22,669, 59,732, 3666, 90,593, 
36,297 and 8672. By (15) these give A=-242. oj, is -0079 against the normal 
theory value -0084 of of. From Fig. 2 the risks of the first kind of error are -046 


and -0085 instead of -05 and -01. There is a slight tendency to underestimate 
significance, but for practical purposes this is negligible. 


5. Further Examples (Randomized Blocks). Whether the test will usually be 
unbiased depends on the values of A which ‘we are likely to meet in practice. 
An examination of uniformity trial data in different fields would therefore be of 
value. In the following I have considered four examples. 


(I) A trial with mangolds by A. Mercer and W. Hall published in Journ. Agric. 
Sci. tv (1911), p. 107. ; 


(II) A trial with wheat by A. Mercer and W. Hall in the same paper. 


* Tables of the Incomplete Beta-function, edited by Karl Pearson, Biometrika Office, University 
College, London. 
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(III) A trial with oats on the field ‘Za Baranem”’ published in the Polish 
journal Roczniki Nauk Rolniczych (1917) and in Landw. Versuchstationen, xc 
(1917), pp. 225-40 (authors, M. Gorski and M. Stefaniow). 


(IV) An experiment giving nitrogen content in barley carried out by St Bar- 
backi and published in Mémoires de I’ Institut National Polonais d’ Economie 
Rurale 4 Pulawy, x1v, Nr. 213 (1933), pp. 106-57. 


-06 


RANDOMIZED BLOCKS. 
(8 BLOCKS ,4 TREATMENTS). 


fe) 


01; (b) 


> 419 


492 250 ‘375 500 625 
SCALE OF A. 


Fig. 2. True probability of the first kind of error in Randomized Block experiment with 8 blocks 
and 4 treatments. (a) Normal theory «=-05. (6) Normal theory «=-01. 


In each case the data as published were grouped up until the plots were 
of such size and position as might be used in a Randomized Block experiment. 
For this amalgamated data the necessary information for comparing the 
Uy and U;, distributions is given in Table Ii.* It will be noted that for the first 
: | three examples there is exceedingly good agreement. For the fourth, the true 
risks of the first kind of error are -041 and -007 instead of -05 and -01, and the test 
tends to underestimate significance. 
aoe Whereas these practical trials show no serious bias in the test, it must not be 

1 inferred that this will always be the case. Theoretically it has been shown that the 
test will underestimate significance if the block variances are too discrepant. 


* n and s are in each case different from 8 and 4, so that compariscu should not be made with 
Fig. 2. 
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In the cases where the discrepancies are sufficiently large to matter, the experi- 
menter will probably notice something peculiar in his data and make allowances 
accordingly. Sometimes, however, there may be doubt and an investigation on 
the above lines may be useful. It should be noted that the actual work involved 
is not great. The total “within block” variation is computed in any case for the 
analysis of variance, and it involves little extra trouble to calculate the separate 
“within block” variances, from which A is obtained. In this connection a table 
may be useful showing for different » and s the range of values, A, for which the 
bias in the test is neglig*le. 


TABLE II 
The comparison of Uy and U,, distributions for certain uniformity trial data 
Example: I il iit IV 
nX8 10x5 12x3 6x5 4x4 
ey, 00426 -00588 -01068 -02679 
A -1312 -1847 +2023 -4258 
“00434 -00566 -01108 -02392 
p (Up>U,) for «=-05 050 048 053 041 
p (Ug>U,) for e=-01 010 009 007 


It is of interest to note that the procedure described above, when applied to 
the example of section 21 of R. A. Fisher’s Design of Experiments, gives results 
consistent with his. This example relates to an actual experiment in which the 
question is asked whether one kind of seed is better than another. A positive 
difference of means was observed and from the ¢ test there was a chance -02491 
of getting a result as great or greater without there really being a differentiation 
in seed. By considering the 2 results, which could be obtained in this experiment 
by randomization on the “null” hypothesis, Fisher obtained, without any 
approximation, a significance level of -02628 for the observed difference. By means 
of the Type I approximation to the U, distribution I found that, corresponding 
to the normal theory e = -05, the chance p (Uz > Up) for this data was -053. As I was 
considering the chances of obtaining as large an absolute deviation as the one 
observed, the result agrees with Fisher’s as far as the third decimal place. 


6. Latin Squares (Normal Theory). In Latin Square experiments the field is 
divided into s rows (¢=1, 2,...,s) and s columns (j= 1, 2, ...,s), making s* plots. 
The treatments tested (k= 1, 2, ...,8) are arranged on the field so that each falls 
once and once oniy into every row and every column. Upon the yields of the 
experiment the analysis of Table ITI is performed. 

The test, whether there are significant differences between the treatment 
means, involves the calculation of z=} log, (v,/v)) and reference to tables based 
on normal theory. This theory proceeds from the assumption that, if the treat- 
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ments are equivalent, the yield of the plot in the ith row and jth column can be 

where A, R; (¢=1, 2,...,8) and C; (j=1, 2, ...,8) are constants, and the ’s are 
normally and independently distributed about zero with the same standard 


TABLE III 
Analysis of Variance for Latin Square 
Sum of Mean 
Source Degrees of Freedom | g quares | Square 
Between Treatments (s—1) 
Between Rows (s—1) 8, Vo 
Between Columns (s—1) 8; Vs 
Residual (s—1) (s—2) So U 
Total (s*—1) 


deviation. Asin the case of Randomized Blocks, we shall consider U = (S,)/(S,+,), 
which is now related to z by the equation 

(20) 
On the assumption (19), we have 


+ 
where x? and x? are independently distributed as x? with degrees of freedom 
fo=(8—1) (s—2) ana f, =(s— 1) respectively. The distribution of Uy* is therefore 
p (Uy) =const. x (21) 


and the moments are 


= (22) 
(s+ 1) 
2 (s— 2) 


7. Latin Squares (Randomization Theory). It is now necessary to consider in 
more detail the arrangement of the experiment, and to formulate precisely the ) 
hypothesis which it is meant to test. Let the yield which the kth treatment is 
capable of giving on the plot (7, 7) be x,). Then we shall suppose that the 
hypothesis under test is that every plot would give the same yield, however 
tronted, Le. that = (25) 


* The suffix N is used, as before, to denote the distribution of U on normal theory. 
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We are concerned only with the probability of the first kind of error, and therefore 
the whole of the following analysis supposes (25) satisfied. As has been stated, the 
experiment is arranged so that each treatment falls once into every row and every 
column. For example, suppose the yields x,; form on the field the 4 x 4 square of 
Fig. 3 (a), and the arrangement of the treatments on the field follows the plan of 
Fig. 3 (b). Then the yields reclassified by treatment and row will be as in Fig. 3 (c). 
The treatment means are 20-25, 24-75, 23-75 and 23, and from them may be 
calculated the treatment sum of squares, S, of Table III. Any other arrangement 
of treatments, differing from Fig. 3 (6), but satisfying the Latin Square conditions, 
will lead to a different reclassification of the yields and a different S,. Randomiza- 
tion enters into the experiment in making the decision as to which particular 
Latin Square arrangement is to be applied. A fundamental set of possible squares 


Treatments (k) 
| | | 

| Columns ()) | Columns 
Bea 1234] | S 1) 18 2 30 
2 2 | 34 21 13 2 
| 1 | 30 22 18 28 4 17 33 16 19 
2 2 | 34 13 21 }1842 8 
3 | 38 12 18 23 45 
| | 33 19 16 17 Mean 20} 243 233 23 


Fig. 3(a). Example of yields Fig. 3(6). Example of Fig. 3(c). Reclassification of 
x,;0n a 4x 4 field. Latin Square arrangement yields obtained on applying Fig. 
of treatments (k=1,2,3,4). 3 (6) to Fig. 3 (a). 


(defined below) is decided upon, and from it one particular square is chosen at 
random for the experiment. In order to judge, therefore, the significance of the 
value of the criterion U obtained from an experiment, it is necessary to know 
something of the distribution of values of U which would be generated if every 
element of the fundamental set of Latin Squares were applied to the field under 
essentially the same conditions. As before, attention will here be confined to the 
first two moments of this distribution, and comparisons will be made with the 
normal theory moments of (22), (23) and (24). 

First the fundamental set of squares must be defined. For s small, this set can 
be taken to consist of all the different Latin Squares that are possible. Methods of 
choosing one square at random from this total set have been given by R. A. Fisher 
and F. Yates for s < 6.* The necessary enumeration of squares, which would make 
these methods available for s>7, has not yet been performed. Instead, for s 


* See F. Yates, “The Formation of Latin Squares for Use in Field Experiments”, Emp. J. 
exp. Agric. I (1933), pp. 235-44 and R. A. Fisher and F. Yates, “The 6 x 6 Latin Squares”, Proc. 
Camb. phil. Soc. xxx (1934), pp. 492-507. 
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from 7 to 12, Yates has simply given a single example of a Latin Square, and 
suggests that the fundamental set of squares, to be used in randomization, should 
consist of all squares that can be obtained from this single square, by permutations 
of rows, columns and treatments. Such a set of squares has been called a trans- 
formation set. 

The mean and second moments of U for squares of a given transformation set 
will first be considered. It will be found that, for a given s, the mean U for different 
transformation sets is the same, but this is not the case for the second moment. 
Knowing the second moment for each transformation set it is not difficult, in the 
cases when s + 6, to deduce the second moment over a fundamental set consisting 
of all squares that are possible. We shall first consider moments over single 
transformation sets. 

It is convenient to write 


= + (Xj. + — X..) + — — + 2..)* 
=A'+R,+Cj+u,; (say), 
where it will be noted that 
(j=1,2,...,8); (¢=1,2,...,8); 
i j 
> 
i 


Also if the yield of the kth treatment in the ith row is denoted by 2), we can write 


L 


Lig = L.. + —L..) + — @..) + (yyy — — 
where j is the column into which the kth treatment falls in the ith row. This 
means that we can write 
yu (Say) (28) 
It will be seen that only variation in the quantities y,, need be considered in the 
following analysis. The possible sets of values yj, ,), which can be obtained by 
applying the transformation set of squares to the field, are those which can be 


obtained by applying the squares to the residuals w;;. 
The numerator of U is 


The denominator is 


(So + 8,) =2,2 (x;.—2..)?— (w.;—2..)? 
=) > => > (30) 


i.e. is the same no matter what the Latin Square arrangement is. The moments of 
U depend therefore only on the moments of S, , and these in turn depend only on 
the possible sets of y’s. From (29) it is seen that S, is symmetrical with respect to 


* Dots indicate that means are being taken over all the values of the letter replaced by the dot. 
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treatments. All Latin Squares of the transformation set, which can be derived 
from one another simply by a permutation of treatments, will therefore give rise 
to the same value of S, and therefore of U. Hence the moments of U over the 
transformation set are the same as the moments over a set, which can be derived 
from an original square by permutation of rows and columns only. We shall denote 
this set by Q and expectations over Q by E. Then from (29) 


E(s8,)=E 
ki ki+m 


since, for any treatment, the column j occupied in row i and the column j’ in row 
m can be with equal likelihood any pair of values, except j =j’. Using the relations 
(27) we obtain 


8(s—1) | 


E (s8,) =2.2. 


S, 
31 
(s—1)’ 


agreeing with the normal theery value of (22). For the second moment we have 


E (s*8}) = ELS 


The method used to evaluate this expectation in the case of Randomized Blocks 
can no longer be applied, since the y’s in different rows are not independent. The 
difficulty is greatest for terms in which k+k’, e.g. the term yi) YmnaYpa@Yoo- 
To obtain the expectation of this, it is convenient to divide the set Q into a number 
of sub-sets and first find the expectation for each sub-set. We shall put into one 
sub-set all the squares of Q in which treatment 1 is allocated to the same plots. 
Such a sub-set will be termed w (j,, jg, ..., js), Where j; is the column into which 
treatment | falls in the ith row. For example, the square of Fig. 4 (a) is a member 
of w (2,1, 3,5, 4), and all the members of this set will be obtained by permuting 
rows and columns in such a way that the 1’s are not moved. This permutation 
may be done by interchanging rows in any way, and then making the necessary 
column permutation to bring the 1’s back to their original positions. If, for 


Hence 


* For summation convention see footnote, p. 26. 
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example, rows | and 4 are interchanged, then necessarily columns 2 and 5 must 
be interchanged. For different sets w, the column permutation to be taken in 
conjunction with any particular row permutation is, of course, different. 


ow a 
| | 1 | -&. 6 
Fig. 4 (a). Example of Fig. 4(6). Square of same set Q 
5 x 5 square belonging as Fig. 4(a), but belonging to 
to w (2, 1, 3, 5, 4). w (1, 2, 3, 4, 5). This particular 


square may be chosen as the funda- 
mental square of w (1, 2, 3, 4, 5). 


Instead of considering immediately expectations over the general sub-set 
w (j4; Jo; «++; Js); it is simpler to start with the particular set w (1, 2, ...,s), for which 
treatment 1 lies down the principal diagonal. Expectations over this set will be 
denoted by #’. Then 


Yma) Yaa = Umm Upp Uaq 
impad 


(33) 
and (> > 2 Yi) Yoo] = (22 Uiz Umm) Y Yao! 


Our first problem, therefore, is to evaluate terms of the form E’[y,,9 Yq). For, 
having these, we can deduce (34); then by analogy we can obtain the expectations 
of (33) and (34) over the more general sub-set w (j,, jo, ..-, j,); finally we can 
combine all the sub-sets* to obtain the expectations of the same quantities over 2. 
E(s?S?) will then follow from (32). 

To fix ideas we shall take one member of w (1, 2, ...,s) and term it the funda- 
mental square of the set. The rows in this square will be numbered M = 1, 2, ..., s. 
(Fig. 4 (6) shows this done for squares belonging to the same set Q as the square of 
Fig. 4 (a).) The manner in which all the squares of the set can be obtained from 
the fundamental square is this: the rows can be permuted in any way: the 
columns must then be permuted in exactly the same way, in order that the 1’s 
should come back on to the principal diagonal. For short, this type of permutation 
of rows and columns will be termed a symmetrical permutation. 


* There will be the same number of different squares in each sub-set w (j,, joy «-+sjs)« 
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First take q=p. Then making all symmetrical permutations, treatment 2 in 
row p will fall with equal frequency into all columns except the pth. Therefore 


Next consider q +p. Treatment 1 will occupy in the rows p and q the positions 
Up, and u,,. There are now (s*— 3s + 3) possible pairs of values for y,,9 and ya, 
but all of these do not occur equally frequently in the set w (1, 2, ...,s). The possible 


kinds of pairs are illustrated in the following diagram, the bracket denoting the 
plot on which treatment 2 falls. 


(i) (ii) 
(iii) (iv) 


We may have 


(i) ANd 

(ii) Yn(2) = Upg and Ya(2) = Ugj (j +p or q), 

(iii) and (J#p org), 
or (iv) and Yyg=U; (jandj’+porq, j+)’). 
There is 1 pair of kind (i), (s — 2) of kind (ii), (s — 2) of kind (iii), and (s — 2) (s— 3) 
of kind (iv),* making (s*— 3s + 3) possible values for y,,.) y,¢.. To find the relative 


frequency of these we must now consider certain properties of what we have 
termed the fundamental square of w (1, 2, ...,8). 

Take a particular row M of this square. In this row treatment 1 falls into 
column M. There will now be some row (R say) for which treatment 2 falls into 
column 


Column M Column R 


| 


Row M 1 


Row R 2 


* T am throughout taking s to be greater than 3. 
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In this row treatment 1 falls into column R. There are now two possibilities: 
either (a) treatment 2 in row M falls into column R, or (b) it does not. Hf it does, 
we shall say tiiat treatments | and 2 satisfy, for row M, the reversal property P. 
If not, we shall say that they satisfy the property P. Further, taking all the rows 
M (=1,2,...,8), we shall denote by 7, the number of them in which the property 
P is satisfied for treatments 1 and 2. More generally by n,,, will be denoted the 
number of rows in which treatments k and k’ satisfy the reversal property. (For 
example, in the square of Fig.’4 (b), for the treatments 1 and 2, the property P 
holds for M=2 and 5, the property P for M=1, 3 and 4. Hence n,, is 2.) 

The relevance of the above considerations is seen when we come to find the 
proportion of squares of w (1, 2,...,8) which give 9,9) Yqa)=Wpq Ugp- This is the 
situation pictured in the first figure of the diagram above. For any square of the 
set w (1, 2,...,8) which produces this situation, the four elements falling on the 
plots u,,,, Upg> Ugp and u,, satisfy, for treatments 1 and 2, the reversal property. 
These four elements must therefore correspond to four elements satisfying the 
reversal property somewhere in the fundamental square of w (1, 2,...,8), for 
permutation of rows and columns will not destroy the property. The chance that 
Y (2) Yolo) = Upq Uqp: therefore, depends on the number of times the property holds 
in the fundamental square. Now we have seen that the squares of w (1, 2, ...,s) 
are obtained by any permutation of rows followed by the same permutation of 
columns. There are s (s — 1) ways in which two rows of the fundamental square can 
be chosen to fall on the rows p and q of the field. It is seen that the number of 
these pairs of rows in which treatments 1 and 2 satisfy the reversal property P 
is Nj. Hence the chance of ¥,,(9) Yq(2) = Upg Ugp in the set w (1, 2, ..., 8) is Myo/8(s— 1). 

Let us denote by p,, po, pz and p, the chances of getting in w (1, 2,...,s) the 
individual pairs of values y,,9) and y, 4) referred to above as of kinds (i), (ii), (iii) 
and (iv). Then we have 

Pit (8—2)p.=1/(s—1);  pe=ps; \ 
P+ (8—2) (8-2) (8— 2) (8—3) py= 1) 

The first of these relations follows from the fact that in w (1, 2, ...,s) the chance 
that ¥,(2)= Up, is 1/(s—1). The second follows from considerations of symmetry 
and the third from the fact that the chances of all (s?— 3s +3) pairs must add up 


to unity. Using the value of p,, which we have already evaluated, we obtain 
from (36) 


Py (8 — 1) 
Pa= {M2 +8 (8 — 3)}/s (s — 1) (8 — 2) (s—3). 


All 
Now E [Y12) =p pq Uap + Pe pq Ugj 


+p3> Uap 
= 


+py>' Uy; 


j 
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where 2’ denotes summation of j over all values 1, 2, ..., 8s, excluding p and q, 
and yy denotes summation of j’ over all values 1, 2, ..., s, excluding p, q and j. 


Performing these summations, remembering that Say u,;;=0, and substituting the 


‘values of p,, Ps, pz and p, from (37), we obtain finally 


[Ya Yao) = Ugg + Upp Ugp — Upg + Upp Ugg — > Upj 
N19 
Uqq t (8— 1) Upp Uap 
+ (8? — 38 + 1) Wey + Upp — (38) 
j 


We have now obtained the expectations of y?4 and y,9 Yq» over the 
symmetrical permutation set which keeps treatment 1 down the major diagonal. 
Of the two, the latter depends, not only on the possible yields of the plots in rows 
p and q, but also on the structure of the squares in Q. It does not matter which 
particular square of Q is chosen to evaluate 7,5, for the number of rows, in which 
treatments 1 and 2 satisfy the property P, is quite independent of any permutation 
of rows and columns (e.g. for any of the squares of Fig. 4. 2,. is equal to 2). 

pq p p+q 
will follow from the necessary summations of (35) and (38). These summations 
are simplified by the following relations: 


2, Upp Ugq = (2 2 ) 
jp 


Using these relations, we get finally 
1 2 
Yn Yao) =7 (2 $2 Upp — 22 Upq “ap 
pa (s—1)(s—2) 
‘hs 
= 2 —1 
+(s?—3s+1 tap) + (22 (40) 
Substituting (40) into (34), we obtain 
imped )(s 2) 


Y 2) W), 


at 
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where (Yay, | 
Z= (> (>> UimUtmi), W= (> 


The expectations of (33) and (41) have been evaluated for the set having treat- 
ment 1 down the principal diagonal, i.e. the set for which treatment 1 in row i 
occupies the column 7. For the more general set w (j;, jo, ...,j,); in which treatment 
1 in row ? falls on the plot (i, j;).the expectations will be of exactly the same form, 
with X, Y, Zand W defined by the more general expressions: 


Further, the same expectations for the total set Q will be obtained by replacing 
X, Y, Zand W by E[X], E[Y], E[Z] and F[W]—their expectations over all the 


set of permutations (j,, jo, ..., j,). Thus 
E >>> Ym Ypa Yaw! (44) 
and 
1 
Hence 
(> 2 > 2 2 Y x) Yor] 
“ 1) E[X]-sE[Y]- 
( Max) 
+5 1) 6-1) BLY] + 3841) 


The derivation of these expectations of X, Y, Z and W is given in an Appendix 


to this paper. The results only are presented here. They involve four symmetrical 
functions of the plot yields, viz. 


D= | 


tm fj 


...(42) 
he 
= 
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With this notation, 
1 ‘ 
1) D+3(s?—3s4+1) F 
— 3s(s—1)@G+6H]; 
[Y] 2D +(s—1) F—sG); 
E[Z|= gy 1) D+ (6-38 +3) F 


VF 
Substituting (48) into (46), 


E [s?S?)]= [2s* (s — 1) D+ (s*— + 2s? + 6s — 6) F 

(6—3) 

( 2 mw) 

— 2s (s?— 3s +3) G—2(s?—6s +6) [282 (s —1)?.D 

+ 2(2s*— 6s +43) F —2s(s~1)(s?—3s+3) 13s?— 12s + 6) H]. 
49)* 
The expectation of U? then follows from ti 


In the case of Randomized Blocks the scasoukeilad of U? depended only on the 
size of the experiment and on a single function, A, of the plot yields. From 
algebraic considerations it was possible to show that there was an upper limit to 
the probability of the first kind of error when the z-test was applied. For the 
Latin Square the situation is much more complicated. ZH [U?] depends on three 
functions of the plot yields (viz. D/F, G/F and H/F), and also on the function, 
( > n,,), of the structure of a typical square of Q. No attempt is made here to 
k+k’ 


make any definite statement, which will be independent of the values of these 
functions, about the probability of the first kind of error. It is possible, however, ' 
without much difficulty, to make a direct trial of the applicability of the theoretical 
z-distribution in any particular instance, by calculating out from the plot yields 
the quantities D, F, G and H. This has been done in the following section for the 


data of uniformity trials, and for some hypothetical data in which there are 
certain systematic fertility gradients. 
8. The 4x4, 5x5 and 6x6 Squares. In applying (49) to particular cases, 
I shall make use of the methods of choosing squares summarized conveniently 
by F. Yates in the Empire Journal of Experimental Agriculture.} The squares 


* For confirmation of this result see Appendix B. + Loc. cit. 
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derivable from a single s x s Latin square by permutation of rows, columns and 
treatments are called a transformation set. All the (s!)* permutations do not lead 
to different squares, but however many different squares there are, each will be 
repeated the same number of times. A square with 1, 2, 3, ..., s along the first row 
and 1, 2, 3, ..., s down the first column is called a reduced square. From a reduced 
square (s!)(s—1)! different squares can be generated by permuting all the rows, 
except the first, and all the treatments. The number of different squares in a 
transformation set is equal to: the number of different reduced squares in the set 
multiplied by (s!)(s—1)! There is, in general, more than one transformation set 
for a given s, but the different transformation sets do not contain the same 
number of reduced squares. To make a random choice from all the different 
possible squares of size s x s (giving each the same chance of being used) it is necessary 
to give each transformation set a chance of being used proportional to the 
number of reduced squares it contains. In the following we shall consider the 
distribution of U, firstly over each of the transformation sets, and, secondly, 
over the whole set of different possible squares of size s x s. Expectations over the 
set of all possible squares will clearly be weighted means of the expectations over 
the separate transformation sets—the weights being proportione! to the numbers 
of reduced squares in the sets. 
In the cases s= 4, 5 and 6, equations (49) and (50) give 


1 


( Max) 
+ 9216F [22F + 288D-168G+76H]; __...... (51) 
1 
8=5, 1300 + 00D — 130G — 2H] 
(2 Ney) 
—520G+292H]; __...... 52 
+ 360 000 + 800D 520G + 292H]; (52) 
1 
= — 252G— 
8=6, ps 3040 + 360D 52G — 12H] 
(2 Nex) 
4665 600F [78F + 1800D— 1260G + 804H]. ...... (53) 


When s=4 there are two transformation sets, an illustration of each being 
given in Fig. 5 (a). The first set contains 3 different reduced squares and the other 
only 1. For the first set ( } ,,-) = 16. This is seen in the following way. Consider, 


say, treatments 2 and 4 in, say, the 3rd row. Treatment 2 falls into the 3rd column. 
Now see in what row treatment 4 falls into the 3rd column. It is the 2nd row. Now 


s=4, [50F + 96D— 564+ 4H] 
192F 
? 
} 
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see if in the 2nd row treatment 2 falls into the same column as treatment 4 does 


in the 3rd row. It does not. Therefore treatments 2 and 4 do not possess, for row 3, — 


the reversal property, which I have referred to as P. Similarly, this property is 

seen not to hold for any row, and hence n.,= 4.=0. In the same fashion it is 

seen that all the other n’s are 0, except n,,=%,;=4 and 3,=%3=4. Thus 

(2 2,4) = 16. In the second set the reversal property holds for all pairs of 


treatments throughout. All the n’s are 4 and thus ( > n,,)=48. The second 
k+k’ 


moment of U about zero for the two sets is therefore obtained by substituting 
into (51) the values ( } n,,,)= 16 and 48, respectively. For the second moment 
kk’ 


over all possible squares we must take ? of the first result plus } of the second, the 
ratios of the numbers of the reduced squares being as 3: 1. This is the same thing 
as substituting into (51) the weighted mean of the ( } ,,-)’s, i-e. 

k+k’ 


(3 x 164+ 1 x 48)/4= 24. 


| 
2.3.43 | 2 2 & 3 2 
2143/|/2143 | 
I II I Il 
Fig. 5 (a). Illustrations of two Fig. 5 (6). Illustrations of two 
4x4 transformation sets. 5x 5 transformation sets. 


For the 5x 5 squares there are again two transformation sets, illustrated in 
Fig. 5(b). In the first set there are 50 reduced squares and ( } ,;,)=16. In the 
kek’ 


second set there are 6 reduced squares and ( > »,,/)=0. The weighted mean of 
k+k’ 

( is (50 x 16+ 6 x 0)/56, ie. 143. The second moment of U about zero, 

kek’ 


over the two sets and over all possible squares, is obtained by substituting 
respectively these three numbers for ( >} ,,,) in (52). 
kek’ 


For the 6 x 6 squares there are 22 transformation sets. Yates illustrates only 
17 of these, since the other 5 can be obtained by rotating 5 of the 17 through a 
right angle. For our purpose also it is unnecessary to distinguish between two 
sets of squares, one of which is the other rotated through a right angle. Such 


rotation does not affect ( The numbers of reduced squares and ( 
k+k’ kek’ 
for each of the 17 sets is given in Table IV. Substitution into (53) gives the corre- 


sponding py for U. The least ( ¥ »,,-) is 0, the greatest 108, and the weighted 
kek 
mean 33;$;. 
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TABLE IV 

Summary of 6 x 6 transformation sets 
Yates Index Number I Il Vi Vel 
Number of Reduced Squares | 2160 1080 1080 1080 1080 540 540 720 360 
( = my’) 20 20 16 44 28 28 76 60 60 
kek! 
Yates Index Number x AVE 
Number of Reduced Squares | 180 240 120 60 40 72 36 20 
( = my’) 36 36 36 36 0 60 60 108 


9. Examples (Latin Squares). In the following section the examples are 
numbered and described. It may here be noted that, although we may derive 
s!(s—1)! different squares from a reduced square by complete permutation of 
treatments and permutation of all rows except the first, all these squares do not 
give a different U. For if the hypothesis that the treatments are equivalent is 
true, any permutation of treatments will not affect the between treatment sum 
of squares and therefore will not affect U. A reduced square therefore gives only 
(s—1)! different U’s. When s=4, this means that one transformation set gives 
3 x 3!=18 values and the other 1 x 3!=6 values. The complete set of possible 
values is 24. For s=4, therefore, it is not difficult to work out all the possible 
values and derive therefrom second moments from randomization. This was done 
in the first example to test the correctness of (49). 


Example I (4 x 4). Artificially constructed set of values u,; of Fig. 6 (a). 

Example II* (4x4). Uniformity trial giving nitrogen content in barley 
(St Barbacki). 

Example III* (5x 5). Uniformity trial with oats (Gorski and Stefaniow). 

Example IV* (6 x 6). Uniformity trial with wheat (Mercer and Hall). 

Example V (6x6). An artificial set of yields x;;, given in Fig. 6(b), in which 
the fertility level runs diagonally across the field. 

Example V I (6 x 6). An artificial set of yields x;; given in Fig. 6 (c), in which the 


yield on any plot is equal to the yield on the plot two columns to the right in the 
next row. 


For each of the above examples the necessary functions of the plot yields were 
computed and substituted in the appropriate equation (51), (52) or (53). For the 
6 x 6 squares the equation (53) was not evaluated for all the transformation sets 
but only for the ones giving the extreme values of 0 and 108 to ( ¥ n,,,). The 


* Examples II, III, IV are regroupings of the same data used in Examples IV, IIT and II of 


section 5 of the paper. The necessary references are given there. The Mercer and Hall wheat yields 
are given in Table VI. 
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expectation for the set of all possible squares was also evaluated. The results are 
given in Table V. The variance, oF.» of U from normal theory is of course obtained 
from equation (24). In the last column but one is entered the ratio of th variance 
from randomization to the variance from normal theory, i.e. oF, /S%,- 


50 41 32 17 65 71 50 41 32 17 65 7 | 
65 50 41 32 17 65 65 59 50 41 32 17 | 
/ 1 0 3-4 47 65 50 41 32 17 | 46 51 65 59 50 41 | 
2-2 0 0 23 47 65 50 42 32 | 24 19 46 51 65 59 | 
3 1-2-3 38 23 47 65 50 41 35 29 24 19 46 51 | 
-—6 1-1 6 41 38 23 47 65 50 32 47 35 29 24 19 

(a) (6) (c) 


Fig. 6. (a) Set of residuals u,; of yields on artificial 4 x 4 field. 
(b) and (c) Artificial sets of yields x,; on 6 x 6 field. 


TABLE V 
Comparison of of, and oj, for artificial examples and for data of three 
uniformity trials 


domization “Ue Un | for <= 05 | 
I 16 13914 02803 04040 6938 — 
48 13066 01955 04040 4838 — 
24* 13702 02591 04040 6413 
II 4 16 13218 02107 -04040 5215 
48 12417 -01306 04040 
24* “13018 01907 04040 4720 
5 16 07809 01559 02083 — 
0 07862 01612 02083 - +7740 — 
142* “07814 01564 02083 “7509 -0288 
IV 6 0 048215 008215 011852 6931 — 
108 049106 009106 “011852 -7683 — 
048487 008487 011852 “7161 0271 
6 0 05355 01355 011852 1-1432 — 
108 05403 01403 011852 1-1839 — 
33,};*| 05370 01370 “011852 1-1556 0624 
6 0 05139 “01139 011852 9609 — 
108 “05415 01415 011852 1-1937 
| 33;2,*| 05223 01223 011852 1-0321 -0528 


| For each example two different transformation sets are considered and also the set of all possible 
squares. This latter is indicated by an asterisk in column 3. In the last column is given the pro- 1 
bability of the first kind of error when the « of normal theory is -05. 
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For the uniformity trials II, III and IV it is seen that there is a very much 
greater disparity between the variances from the normal and randomization 
theories than was the case for Randomized Blocks. This seems to indicate that 
the z-test in the Latin Square is more liable to bias. Admittedly only three 
examples are considered here, but the data in two of them, at least, are not 
exceptional. Mercer and Hall’s wheat data have often been used in discussions 
of the present character and I give in Table VI (a) the actual grouped yields 
which [ used in Example IV: I made slight adjustments in these yields to make 
the sums of rows and columns divisible by 6. This made it simpler to evaluate 
the residuals u,;=(x,;;—%;.—2.;+2..). Some of the residuals turned out to be 
three-figure numbers, and as this makes the evaluation of D, F, G and H rather 
heavy I rounded them off to the two-figure numbers given in Table 6 (b).* These 
adjustments do not, I think, affect materially the ratios D/ F, G/F and H/F. 


TABLE VI 
Mercer and Hall’s wheat data 
(a) (b) 

| 37-47 36-68 37-61 36-73 35-72 32.96 | 0-7 —0-4 0-2 —08 17 —1-4 | 
36-20 35°94 36-82 36-71 33-43 36-51 | —03 -09 -04 -05 -—03 2-4 
34:00 35°80 37-23 36-63 34:93 38-42 | |} —28 -—-13 -02 -—09 0-9 4-3 | 
34:20 37-89 37-40 36-75 32-67 34-71 —2-0 1-4 0-5 -—0-2 —0O8 1-1 | 
| 37°83 38-05 35°56 38:39 31:75 32-16 | | 1-6 15 —1-3 15 -—17 —16 
| 38:50 35°60 37:63 37-27 33-01 28-68 | a 28 -—03 1-2 0-9 02 -—48 


Mercer and Hall give wheat yields in lb. for 500 plots of =}, acre each. By taking the first 

18 rows and the first 18 columns of this data and regrouping in 36 bigger square plots of size 
s% acre, a 6 x6 square with the yields in (a), above. was obtained. In (6) are given the residuals 
when row and column variation is allowed for. Certain adjustments made in deriving these 
residuals are referred to in the text. 


It is an interesting point that the two systematic arrangements of Examples 
V and VI give good agreement between normal and randomization theories. 
For the 5 x 5 and 6 x 6 squares here considered, the size of ( } ,,-) does not 
+k’ 


seem to matter much. The expectations are much the same for all the trans- 
formation sets. For larger squares than s = 6 the differences between transforma- 
tion sets will probably become still less important. 

In the last column of Table V I have given my approximation to the true 
probability of the first kind of error, when the rejection level is based on the 5 per 
cent. point U, of normal theory. These levels are obtained by approximating to the 
U distribution from randomization, by means of a Pearson type I curve (as in the 
case of Randomized Blocks). I have not done this in the examples with s = 4, since 
there are only 24 possible values of U and the approximation by a continuous 
curve does not seem justified. In the other examples I give the risk of rejecting 
only when the randomization set consists of all possible squares of the size. 


* I did the same thing with Examples IT and III. Without these simplifications the numbers 
would have become uncomfortably large. 
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Results very little different would be obtained for the individual transformation 
sets. For the uniformity trials ITI and IV the probabilities of the first kind of 
error are -029 and -027 respectively, instead of the required -05. There is in these 
cases a definite underestimation of significance by the usual z-test. This statement 
must, however, be qualified by the remaiks in the next section. 


10. Summary and Conclusions. In experiments in which randomization is 
performed, the actual arrangement of treatments on the field is one chosen at 
random from a predetermined set of possible arrangements. In the present 
paper investigation has been made for Randomized Blocks and Latin Square 
experiments, into the distribution of the statistic z, generated by the application 
to the observed plot yields of the whole fundamental set of arrangements, 
assuming as true the “null’’ hypothesis that the treatments have no differential 
effect on the plots. It was found convenient to consider, instead of z, a mono- 
tonically increasing function U of z, which is equal to the treatment sum of 
squares divided by the total of the treatment sum of squares and the residual 
sum of squares. 

Comparison of the U distribution from randomization with that from normal 
theory showed, in both Randomized Blocks and Latin Square, exact agreement 
of the means, but disagreement in the variances with consequent disagreement 
in the proportions of the distribution falling beyond certain points. Some 
uniformity trial data were used in order to see whether, in practice, these dis- 
agreements were of sufficient magnitude to be of importance. For Randomized 
Blocks the cases considered showed close enough agreement between the random- 
ization and normal theory variances of U. In each of three uniformity trials for 
Latin Square, however, the randomization variance of U was considerably 
smaller than that of the normal theory. Whether this should be taken as evidence 
of bias in the usual z-test based on normal theory, depends on the point of view 
adopted concerning the hypothetical population about which the data of an 
experiment is supposed to give information. Let us consider two possibilities. 


(i) We may make, from the yields of the experiment, a sta/istical inference only 
about the situation on the particular field of the experiment, e.g. as in the present 
paper, we may be using our statistical method only to test whether all the treat- 
ments would have given identical yields on each plot of this particular field. Of 
course if we come to the conclusion that the treatments can be regarded as 
equivalent on this field, we probably make the further induction that they can be 
regarded as equivalent over some wider range of experience. Otherwise the 
experiment would be useless. However, from the present viewpoint, this further 
inference is not a statistical one in the usual sense. If the statistical part of the 
inferences made from an experiment go no farther than the experimental field, then 
in cases where the variance of U from randomization on the “null” hypothesis is 
less than that from normal theory, we may say that the usual z-test underestimates 
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the significance of the observed treatment differences. Where the variance of 
U from randomization is the larger, there the z-test overestimates significance. 

(ii) We may, alternatively, choose to regard the inferences, drawn from the 
experimental yields to some wider experience, as being completely statistical. This 
means that we shali not only regard the statistic z calculated from the yields as 
arandom sample from the distribution, which is obtained on the “null” hypothesis 
by applying all the possible arrangements of the fundamental set to the experi- 
mental field; we shall in turm regard this randomization distribution of z as a 
random sample from a set of similar distributions, which might be obtained in 
other experiments of a similar type. These may be carried out on different fields, 
under differing weather conditions, and may be subject to different technical 
errors in the harvesting and weighing of the crop and so on. Clearly with this 
wider conception, the results derived here for three Latin Square uniformity 
trials are insufficient to give any general answer to the question of bias arising 
in the application of the z-test. We should note, however, that randomization 
ensures the agreement of the mean U with normal theory. The second moment 
of U for individual fields may, as we have seen, differ appreciably from the normal 
theory value. In a number of experiments, however, these differences may tend 
to balance out, so that on the average the discrepancy may be negligible and the 
normal theory test unbiased. 

In this connection it is of interest to recall the investigation of O. Tedin,* who 
considered the application of 5 x 5 Latin Squares to 91 uniformity trials. He took 
twelve different arrangements of the 5 x 5 square. Each arrangement was applied 
to all the 91 trials, giving for each arrangement 91 values of a criterion, which is 
practically the same as my U and which he termed a “ treatment error coefficient’. 
He came to the conclusion that it was dangerous to apply systematically the same 
arrangement (at least if it was of either the Diagonal or the Knight’s Move 
pattern) in every experiment and still expect the normal theory z-test to be 
unbiased. The application of the methods of the present paper to such a set 
of uniformity trials would, I think, be useful. It would indicate how far the 
process of randomization does actually eliminate bias, when the z-test is 
regarded from the second viewpoint mentioned above. 


APPENDIX 


A. Derivation of Expectations in Equation (48). In the following, >’ will be 
used to denote summation over all possible sets of values of the row suffixes, 
excluding terms in which two or more of the row suffixes are the same. Thus, for 

example, >” U3), %j,%mj,, 18 @ Summation over all 7, / and m, excluding terms in 


* O. Tedin, “‘The Influence of Systematic Plot Arrangements upon the Estimate of Error in 
Field Experiments”, J. agric. Sci. xx1 (1931), p. 191. 
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which two or more of i, / and m are equal. This summation, therefore, has 
s(s—1)(s—2) terms. With this notation, 
py ui;,+ = Magy; 
(2 ui; Wi Wij, 2D Mp, 
v 

+2 ta (54) 
W, X, Y and Z are thus dependent on twelve kinds of term, which are listed in 
column 1 of Table VIT. The expectations of these terms are all derived in the same 
manner, and, as the algebra is somewhat long, only one example will be given in 
full here. For instance, consider E (u3;,u,,;,¥,;,,). The expectation is being taken 
over all sets of values (j,, js, ..-, js) which are permutations of the numbers 
(1, 2, ..., 8). The term under present consideration involves only the three different 
rows 7, m and r, and we have only to consider what happens when j;,, j,, and j, take 
all the values (1, 2, ...,s), excluding any two of them being equal. Let us start 


by taking j,; and j, fixed. Then j,, can take all values (1, 2, ...,s), except j; and j,. 
Hence 


(> U,j — — 
Now consider j; fixed, so that j, can take all values (1, 2, ...,s) except j;. Hence 
(> Umj — 
(> Uj Upj | 
Finally, j; can take all values (1, 2, ...,8). Hence 
1 


E [wi = 8 (s— ~1)(s 22 j Umj — (2 Unj U,;) ui 


The same method leads to the other entries in column . the cain. point of 
the work being the repeated use of the relation > u,;;=0. Next we have to consider 
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the expectations of the elements in column 1, when the summation >” is applied 
to them. This involves the evaluation of summations >” of the quantities listed 
in column 1 of Table VIII. Again, only one example will be given here in full, to 


TABLE VIII 
(2) 
(1) Column (1) 
summed over 
VF 
Wis Urs) —VF 
ut 5) D 
(E 0,3) —D 
5) G’—D 
(= Umj Urs) 2D-—G" 
H-@’ 
& Uj; 2G’ —H 
Uj Ups) 2H + F—66’ 
Uij U,j) 3G” —6D 
j 


| 
illustrate the method employed. The essential feature is the repeated use of the 
relation }w,;=0. For instance, take >’ u,;%;)(> %,;). First keep i and J 
i j 7 
fixed and sum r over all values (1, 2, ...,s), excluding ¢ and /: 


(> Wis Wy) (> U4; =>’ (> Uj) (> Ui; —u ij — Uy) 
Now keep # fixed and sum / over all values (1, 2, ...,s),-excluding 7: 
(2 U4; Uj) U4; U,j)= — >" (2 (2 — (22 u?,)?} 
22 (> uz ;)? Mj). 


The results of the other similar summations are given in column 2 of Table VIII, 
use being made of the following notation: 


4-2 
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From the expressions of columns 2 of Tables VII and VIII, the expectations 
given in column 3 of Table VII are deduced. These are the expectations of the 
different kinds of terms involved in W, X, Y and Z, and by substitution into (54) 
the expectations E[W], E[X], E[Y] and E[Z] are obtained. These are given 
in equations (48) of the paper. 


B. Confirmation of Equation (49). The algebraic processes leading to equation 
(49) are so heavy that one would feel more confident of their correctness if some 
practical test were made. This can very easily be done when s=4. For then there 
are only 24 possible values of S, and these can be calculated directly from the 
data, This was done in Example I of section 9 and exact agreement with the 
theory was observed. For s>4 the number cf possible Latin Squares seemed 
too large to permit a complete investigation of this kind. It is possible, however, 
to obtain some general confirmation in the following way. 

In the Randomization theory we made no assumptions about 2,;. Let us now 
consider what happens if we apply the reasoning of the theory to a situation 
where the z’s do actually satisfy the equation 

the 7’s being normal independent variates with mean zero and common standard 
deviation o. One set of values x,; satisfying these conditions may be termed a 
configuration. There are possible an infinite number of such configurations. We 
shall denote expectations over all these by H’’. 

Now consider the set Q of possible Latin Square arrangements which can be 
applied to the values x,;;. Whatever individual square of 2 is applied to the set 
x; it is clear that, in repeated configurations, the resultant values of 8, will be 
distributed as x*o? and therefore EH” [S?]=(s?—1)o*. Hence, if #[S?] denotes 
the expectation of S? over all the Latin Squares of 2 applied to the same con- 
figuration, we must have a fortiori 


{E =(s?— 1) ie. {E[s?S2]} =s? (s?—1) 


But E [s*S?| is given by the right-hand side of (49). Hence, if we take E”’ {right- 
hand side of (49)}, we should obtain s?(s?— 1) o*. Now it can be shown that 


E" {D}= — 2.5 + = 3 (8 — 1)* o4/s?, 
ij 
E" {F} =(s— 1)? (s?— 2s + 3) of, 
Substituting these for D, F, G, H in (49), we do in fact get s?(s?—1)o*. This 


provides a check on the accuracy of the formula, although, of course, it does not 
constitute a proof of its correctness. 
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SOME ASPECTS OF THE PROBLEM OF RANDOMIZATION 
By E. 8. PEARSON 


1. INTRODUCTORY 


THE practical problem of mathematical statistics is to provide a conceptual 
model which will be of value to the man who needs to draw conclusions from the 
data of observation. In handling statistical data one of the commonest problems 
to be faced is that of drawing inferences from a part to the whole, from a sample 
to the population; such inferences are uncertain inferences, and it follows not 
only that in such cases the conceptual model must be constructed with the aid of 
the theory of probability but that its value to the practical man will be to some 
extent psychological. An historical study of the development of mathematical 
statistics shows an ever-increasing complexity in the structure of the abstract 
model and also an evolution of ideas as to how that model is to be of most use in 
practical application. In this course of evolution it is inevitable that many 
different suggestions should have been thrown out by mathematical statisticians 
as to the best way of linking the world of concepts with the world of experience. 
Ultimately, it is likely that the practical scientist, who may know relatively 
little mathematics but has to apply the methods of statistics in his research 
work, will play the decisive part in determining the form in which the theory of 
probability may be applied most usefully in different situations as a guide to 
judgment. But in the meantime it is necessary that amid the growing complica- 
tion of the mathematical background statisticians should attempt to keep clear 
the simple principles which in their view have the greatest claim for acceptance. 

An example of the gradual evolution of ideas is found in the changing attitude 
with which tests of goodness of fit and tests to determine whether differences are 
‘significant’ have been regarded. Perhaps one may say that 20 or 30 years ago 
the question posed by the statistician in applying such tests was often some- 
what as follows: 

“If my sample had come (a) from the population represented by my fitted 
curve, or (b) from a population whose parameters had the values given by the 
sample (and these estimates obtained from the sample cannot be very different 
from the unknown population values), what is the probability that a difference 
as great or greater than that observed would have occurred?” 

It will be seen that the situation posed was to some extent hypothetical, 
since in fact the population sampled was not represented by the sample values. 
Nevertheless, the probability measure, P, obtained as an answer to this question 
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seemed to give the measure of assurance needed to make a decision. In so far as 
each problem was considered in isolation from other similar problems, the basis 
for any decision taken was to a large extent psychological. 

In recent years we can follow the gradual introduction of a somewhat 
different conception. Its origin may be traced partly to the application of 
statistical methods in new fields where decisions had to be taken on evidence 
supplied by small samples, so that the differences between population values and 
sample estimates became so large that the hypothetical situation referred to 
above was seen to be noticeably unreal; and partly to the fact that in agri- 
cultural research investigations precisely similar tests were being applied again 
and again to the same type of experiment. Thus the relation was emphasized 
between (a) the probability measure, P, leading to a decision in an individual 
experiment, and (b) the expected proportion of times that a hypothesis of 
“no difference”’ would be wrongly rejected in the routine work of a research 
station. In terms of the older approach there might be little difference in an 
isolated problem between the psychological reaction to a P of -05 and a P of 
‘02. But where experimental procedures were being repeated continually, the 
difference between a risk of mistake of 1 in 20 and of 1 in 50 might be of some 
consequence.* 

Emphasis was therefore given in statistical literature to a new idea; that of 
planning a sampling procedure and the subsequent anaiysis of the data col- 
lected, in such a way as to control at any desired level the risk of making a wrong 
decision—that risk which can never be entirely eliminated in any form of work 
involving sampling. This change in attitude is illustrated by the form which 
many recently constructed probability tables have taken, following R. A. 
Fisher’s suggestion. Instead of providing the statistician with the precise value 
of a probability measure, P, which he needed when regarding each problem in 
isolation, these tables are arranged so as to enable him to discover whether his 
test criterion falls below a certain “probability level”, e.g. a 10, 5, 2, or 1 per 
cent. level. If then, for example, as a usual practice he rejects the hypothesis he 
is testing when the criterion falls below the 1 per cent. level (but not otherwise), 
he knows that in the long run of his experience this action will lead to one 
wrong decision in every hundred, a frequency of error which he may be quite 
prepared to accept. 

This form of introduction of abstract theory into the world of experience has 
an obvious appeal to the practical man. If you tell him that theory enables him 
to assess the probability of a certain event in an individual trial or even to 
assess the frequency with which it would occur under somewhat hypothetical 
conditions, he may be unconvinced of the value of this theory to him. But if you 
can illustrate the statistician’s objective by two examples of the following type, 
you are much more likely to convince him of the value of statistical tools. 


* This has been brought out very clearly in questions of routine sampling in industry. 
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Example 1. A frequent problem is one in which, having a sample of n values 
of a variable x, it is wished to determine limits between which the unknown 
population mean, £, almost certainly lies. Under certain conditions the statis- 
tician can here provide a rule for determining from the sample data two limits €, 
and &,, such that the statement 

< < & 

may be made with a specified measure of confidence. The details of the procedure 
advocated depend on the application of the twofold principle that in making 
such a statement we are concerned, (a) to know the percentage of times it will 
be correct in long-run application under appropriate conditions, (b) to make the 
interval in some way as narrow as possible. To reduce the risk of error and to 
reduce the breadth of interval are, beyond a certain point, conflicting objectives 
and a balance must be struck between them; the statistical method shows how 
this may be done. 


Example 2. Another common problem is one in which two samples are 
available and it is wished to test the hypothesis that they have been drawn from 
populations having the same means, £,=£,. Again, the statistician can under 
certain conditions give a rule of procedure suggesting when the hypothesis 
should be rejected, and he may base this on another twofold principle: arrange 
so that in the longrun application of the rule, (a) the hypothesis of “no difference” 
in means will only be rejected when it is true on a small and known percentage of 
occasions; (b) the hypothesis will be rejected as often as possible when there is a 
true difference in means, i.e. when £,—& 40. 

These conceptions have no doubt always been present in the minds of 
mathematical statisticians but they have only been given precise formulation 
in recent years. The principle illustrated in Example 1 forms the basis of 
J. Neyman’s work on confidence intervals and the confidence coefficient (1), and 
although presented in somewhat different form, I think, underlies R. A. Fisher’s 
conception of fiducial probability @). The principle mentioned in Example 2 
forms the basis of J. Neyman and the present writer’s work on the testing of 
statistical hypothesis (3),(4), but in the application of the conception (6) we are 
at variance with R. A. Fisher. In our view, just as in the simple problem of 
‘interval estimation” mentioned in Example 1, it is necessary to specify the 
form of population distribution before the interval €,, &, can be calculated, so it is 
only possible to determine in any precise manner which is the most efficient test 
of a hypothesis if we can specify the class of alternative hypotheses. Thus, 
following quite simple principles, we may construct in the conceptual workshop 
the tests most appropriate in different precisely defined situations. It is then for 
the practical man to decide which of these situations corresponds most closely 
to that with which he is faced. 

In Fisher’s view the experimenter cannot and need not define the alternatives 
to the hypothesis he is testing. Indeed, Fisher would seem to consider it to be 


ic 
Pitt, 
Pat 


56 Some Aspects of the Problem of Randomization 


important to a test of significance that it should be free from the necessity of 
introducing any elaborate type of background or alternatives which might be 
true. While i agree that the experimenter cannot specify all conceivable alter- 
natives to the hypothesis tested, I think that a study of the situations met with 
in practice suggests that he does in fact usually have a fairly clear idea of the 
alternatives most likely to be true, and that if the mathematical statistician 
enables him to use this knowledge in picking out the most efficient statistical 
tool, he will be grateful. If it can be shown that in the situation most likely to 
exist (e.g. normal variation) one test will detect the falsity of the hypothesis of 
“no difference” more often than any other test, the appeal in favour of its 
adoption will surely be very strong. 


2. RANDOMIZATION 


I have referred to the idea of arranging a sampling procedure so that con- 
clusions drawn upon application of an appropriate statistical technique will be 
subject to a known and controlled risk of error. The principle of randomization, 
whose introduction is largely due to R. A. Fisher, provides a device to aid in the 
achievement of this objective. Most of the statistical tests used in the more 
complex sampling problems have been developed on the assumption that the 
variables are normally distributed, and while it is often clear that considerable 
departure from normality will not seriously effect their validity, it may be 
asked how far can tests be constructed which are completely independent of any 
assumption of normality ? . 

Fisher has given an interesting illustration of such a test based on random- 
ization in section 21 of his book, The Design of Experiments (5). The example is 
suggested by an investigation of Darwin’s into the growth-rate of crossed and 
self-fertilized plants. 

In the arrangement of the experiment fifteen seeds resulting from each type 
of fertilization were used ; denote these by A-type and B-type seeds. Fifteen pairs 
of plots, say p,; (s=1, 2, ... 15, i=1, 2), were chosen and prepared in such a way 
that the environmental conditions within each pair were as alike as possible. 
Following the principle of randomization it would then be necessary to deter- 
mine at random, and for each pair independently, which site should be occupied 
by A and which by B-type seed.* After the experiment was completed, the grown 
plants were measured; suppose the character considered (height at given age) 
had values of a, and b, respectively, for the sth pair of plots. Darwin’s problem 
was to determine whether there was any evidence that the type of fertilization 
affected the vigour of the plant. Statistically, this can be examined by testing 
the hypothesis, say Hy, that as far as the character measured on the grown 


plants is concerned, the two samples of seeds have been drawn from identical 
populations. 


* This process of random assignment was not, of course, actually performed by Darwin. 
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The character in the grown plant depends on (i) the environmental conditions 
which may be slightly different between plots p,, and ps, (ii) some quality 
inherent in the individual seed. We may now imagine in the conceptual field 
a continued repetition of the experiment, fifteen seeds of each type being 
randomly selected, fifteen pairs of plots being prepared and a random assign- 
ment of the seeds. If then the method of fertilization is unconnected with sub- 
sequent growth, a given quality of seed will be as likely to be associated with an 
A as a B-type seed, and owing to the randomization will be as likely to be 
associated with the environmental condition of plot p,, as plot py». Hence a 
difference 

x,=a,—b 


of given numerical magnitude will be as likely to be positive as negative. This 
will be true independently for all fifteen plots. 

It follows that the conceptual population of possible experimental results 
X 4, ..., may be divided into an infinite number of subpopulations each 
defined by a given set of fifteen values of | x, |, and each containing the 2 
elements that will be generated by assigning to these numerical values all 
possible combinations of positive and negative signs. If the hypothesis, Ho, of 
no differentiation between the A-type and B-type seed populations is true, each 
of these 2° elements is equally likely to arise. 

To construct a test it is now necessary to find a rule, applicable to every one 
of these subpopulations, which will divide the 2 elements into two classes: 


(1) a class I containing a proportion P of the elements, 
(2) a class II containing a proportion 1 —P of the elements. 


If then we reject the hypothesis, Hy, of no differentiation when the element 
represented by the fifteen differences x?, x9, ..., x€;, actually observed falls into 
class I, but not otherwise, we may be sure that the risk of rejecting H, when it is 
true is controlled at a value of P: e.g. if P=-05 we should be using what is 
ordinarily termed a 5 per cent. significance level. The practical question is, of 
course, how to determine classes I and II. Clearly they should be so determined 
that if one type of seed in fact produces larger plants than the other, the 
element represented by the observed differences 2%, ..., xf; would be likely to 
fall into class I, and thus Hy would, correctly, be rejected. It is seen at once that 
some consideration of the alternatives to the hypothesis tested is entering into 
the construction of the test; it has already entered into the design of the experi- 
ment since the care taken to make the environmental conditions associated with 
the pair of plots p,, and p, similar, was aimed at increasing the chance of de- 
tecting a true difference in seed type if one exists. 

Fisher’s suggestion is to put into class I the 100P per cent. of the 2 elements 
or a number as near that figure as possible for which the fifteen x’s have the 
largest numerical mean value. Thus, for the data of Darwin’s experiment, the 
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values of x?, 29, ..., a9; were: 49, — 67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, 
—48, giving a mean of 314/15 = 20-93. 

Taking the subpopulation of 245 = 32,768 elements generated by all possible 
assignments of positive and negative signs to the fifteen values of |22|, Fisher 
finds that in 1722, or 5-26 per cent., the numerical value of the mean, %, is 
greater than the observed value, 20-93. Consequently, the observed result falls 
just outside the class I associated with a 5 per cent. significance level and we 
should probably not be prepared to risk rejecting the hypothesis Hy. 

The test proposed by Fisher depends upon a particular definition of the class I. 
It is important to note that this definition is in no sense unique. For example, 
we could have put into class I the 100P per cent. of the 2!° elements for which the 
geometric mean of the fifteen values (100+ .2,) differed most from 100. I do not 
suggest that this would be a rational classification, but it is worth while re- 
flecting whether, if we choose to use the arithmetic mean as criterion, we are 
not being influenced, perhaps unconsciously, by 

(a) the knowledge that if variation is normal, a criterion based on the 
observed mean difference in samples will be most efficient in detecting a real 
population difference in seed types; 

(b) the belief that the characters measured, a, and b,, are likely to be approxi- 
mately normally distributed. 


If this is the case, it would seem that the usefulness of the test is in fact 
dependent on the form of the alternative hypotheses. 

Another illustration of the application of this principle of randomization has 
been recently given by Fisher elsewhere(6). He supposes we have available 
measures of the stature of a random sample of, say, n Frenchmen and n English- 
men, and wish to test the hypothesis that the mean height of the sampled popu- 


lations of Frenchmen and Englishmen are identical. Let the observations be 
written as follows: 


Englishmen y,, Yo, ..-, Y,, Mean 7. 

If the 2n observations were written on cards and shuffled without regard to 
nationality, it would be possible to divide them into a group A and a group B, 
each containing » cards, in (27) !/(n!)? ways. For each way of division we shall 
have a mean @ for group A and a mean 6 for group B, giving a difference 

d=a—6. 

Just as in the last example, divide these (2n)!/(n!)? possible differences into 


(1) class I containing the P (2n)!/(n!) (or a number as near below this as 
possible) giving largest values of | d |, 


(2) class II containing the remaining cases. 


Suppose P is chosen to be -05. Then if the difference X—y for the observed 
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French-English subdivision falls into class I, we may reject the hypothesis 
tested, knowing that the risk we run of rejecting it when it is in fact true is -05 
or less. 

This ingenious suggestion of Fisher’s leads to the following result: if we 
adopt the rule wherever a problem of this type arises in our statistical experi- 
ence, we shall have precise control of the risk of wrong rejection no matter what 
was the type of variation in the populations sampled. 

Of course the procedure needed to determine whether the observed sample 
falls into class I or class II is very lengthy, unless the samples are very small. 
I am concerned, however, not with this point, but with the question of whether 
there is something fundamental about the form of the test suggested, so that it 
can be used as a standard against which to compare other more expeditious tests, 
such as Student’s. It seems to me that Fisher is overstating the claim of ar 
extremely ingenious device when he writes ((6), p. 59): “‘ Actually, the statistician 
does not carry out this very simple and very tedious process, but his conclusions 
have no justification beyond the fact that they agree with those which could 
have been arrived at by this elementary method.” The following example should 


at any rate help to bring out some points which appear to need careful considera- 
tion. 


. 


The figures given below represent two samples of seven observations from 
two populations; they form Experiment I of Table I. 


Sample 1. 45, 21, 69, 82, 79, 93, 34. Mean=%,=60-43. Midpoint between 
extreme values = m, = 57. 


Sample 2. 120, 122, 107, 127, 124, 41, 37. Mean=%,=96-86. Midpoint 
between extreme values = m, = 82. 

After pooling these fourteen numbers, they can be redivided into two groups 
A and B, of seven each, in (14!)/(7!)?=3432 ways. We may now ask in how 
many of these ways: 

(1) the difference in means of the two groups has an equal or greater 
negative value than the observed 


(2) the difference in midpoints has an equal or greater negative value than 


the observed m,— Mm, = 57 — 82 = — 252 


After a rather troublesome investigation into the possible arrangements I 
find the answer to question (1) is 126 out of 3432 or 3-67 per cent., and to question 
(2) is 45 out of 3432 or 1-31 per cent. It may be said therefore that random 
assignments of the fourteen numbers into two groups of seven would give 
(1) as large or a larger numerical value than that observed to the difference in 
means on 7-3 per cent. of occasions, and (2) as large or a larger numerical value to 
the difference in midpoints on 2-6 per cent. of occasions. It follows that in 


%, — 60-43 — 96-86 = — 36-43; 
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applying this form of test to the midpoints, we should be more likely to suspect a 


difference in populations sampled than in applying the test to the means. 


Now of course it is quite possible that in individual cases an inferior test may 
detect a real difference when a better test does not. I give below therefore four 
further pairs of random samples from the same two populations, as well as the 


TABLE I 


Experimental Sampling Data 


Experiment I II Til 
Sample 1 2 1 2 1 
45 | 120 29 50 14 
| iss 41 | 125 70 
69 | 107 27 | 112 32 
82 | 127 5 86 79 
79 | 124 27 40 87 
93 41 58 98 25 
34 37 92 | 50 2 
Mean 60-43 | 96-86 | 39-86 | 80-14 | 44-14 
Midpoint toes 20 | 485 | 825 | 44-5 


TABLE II 


IV 
1 2 1 2 
47 60 67 47 
4 90 18 71 
49 84 41 43 
49 100 41 115 
23 93 65 66 
52 32 8 124 
67 98 52 56 
41-57 | 79-57 | 41-71 | 74-57 
35:5 | 66-0 | 37-5 83-5 


Number of pairs of samples, under randomization, having negative values 
for %,—Z, and m,—mg as great as or greater than the observed pairs 


Experiment I 


Mean 
Greater Equal 

difference difference* Total 
121 | 5 126 

56 1 57 

Over 250 >3 > 253 
17 3 20 

82 2 84 


Midpoint 
Greater Equal 
difference | difference* 
40 5 
44 10 
100 41 
17 14 
28 | 25 


* Including the observed difference itself. 


results of applying the two tests. It will be seen that in only one case out of the 
five does the mean supply stronger evidence of difference than the midpoint. 
Both these tests are equally valid in the sense that, using either, we can control 
the error of rejecting the hypothesis that the populations are the same when it is 
in fact true. In the case taken the population means were at 49-5 and 79-5 
respectively and their two standard deviations were the same (= 28-86). 


2 | | 
| 
joes | 60 | | 
| 
41 | 
69 | 
40 
| 48 | | 
| 
| 
63-29 | 
72:0 | 
| | | 
| 
Total | 
| 
45 
Il 54 
iit | 141 
IV 31 | 
V | | 53 
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Yet as far as the very limited experimental evidence goes, the midpoint test 
has been the more effective in detecting the presence of the real difference of 
30 units in population means. The reason for this is explained at once when we 
know that the population distributions were rectangular, e.g. 


for population 1 any value of 2 between 00 and 99 was equally likely to occur; 
for population 2 any value of x between 30 and 129 was equally likely to occur.* 


Since the standard error of the midpoint in samples of n from a rectangular 
population of standard deviation o is 


6 
which for n=7 is -289c; while for the mean 
oz= Tn? 
which for n=7 is -378¢; we should expect on theoretical grounds that the 
difference in sample midpoints, rather than in sample means, would be more 
efficient in detecting real differences in population means. Such a property would 
certainly appeal to the practical experimenter, were not both tests for other 
reasons too lengthy to carry out as a common practice. 

Now of course in practice it is extremely unlikely that we should deal with 
variables whose probability distribution is rectangular, but I have introduced 
these examples because it seems to me to suggest that in problems of this kind it 
is impossible to make a rational choice between alternative tests unless we 
introduce some information beyond that contained in the sample data, i.e. 
some information as to the kind of alternatives with which we are likely to be 
faced. 

If the variation is approximately normal and the standard deviations in the 
two populations are the same, the advantages of Student’s t-test can be expressed 
in simple terms which appeal to the practical statistician. Its use gives control 
of the risk of rejecting the hypothesis of “‘no difference’? when it is true, and at 
the same time makes more probable than does any other test the detection of a 
real difference in means.} It is certainly possible to claim that these reasons 
justify its use rather than the relation it bears to the test of Fisher’s which I 
have outlined. It is true that when variation departs from the normal the f-test 
will not give quite accurate control of the risk of wrong rejection of H, (although 
the error will usually be small), while the test based on randomization will 
continue to do so. It is in this that the value of the randomization test lies; but 
as I have pointed out, in so far as this latter test is applied to means, it cannot be 
regarded as unique, and for wide departures from normality it could probably be 
improved on by use of other central estimates. 


* Tippett’s Random Sampling Numbers were used; 7'racts for Computers, No. XV. 
+ For discussion of this conception see (3) and (4), 
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3. RANDOMIZATION APPLIED TO THE LATIN SQUARE 


The conceptual model which lies behind the design of the Latin Square ex- 
periment leads to the following expression for the yield on the plot in the ith row 
and jth column receiving the kth treatment: 


Here, for a given experiment with an sxs Latin Square, A, R;, C; and 
7, (¢=1, ..., 8 9=1, ... ,8; k=1, ..., 8) may be regarded as constants and the »’s 
as normally andindependently distributed about zero with standard deviation, a, . 
The hypothesis, H,, which it is generally wished to test is that T),=90 (k=1, ..., 8), 
i.e. that there are no treatment differences. 

It has always been recognized, however, that the additive row and column 
contributions, R, and C;, given in this equation cannot provide sufficient elasticity 
to fit all forms of fertility gradient found in practice. Consequently there is bound 
to be some correlation among the y’s from neighbouring plots, and further the y’s 
may not be normally distributed. In a single experiment it is of course quite 
impossible to decide whether the s* values of » can be reasonably regarded as 
independent normal deviates. Two lines of procedure seem therefore to have been 
followed. 

In the first place emphasis has been laid on the importance of randomization; 
in assigning the s treatments to their plots, the particular Latin Square pattern 
used is chosen at random from the very many possible patterns, say N, in number. 
The infinite population of results which can be conceived as obtainable from 
the experiment, if H, is true, may then be divided into an infinite set of sub- 
populations, each containing a finite number of elements, N,. Each subpopulation 
is defined by a set of s? yields, y;;, and an element corresponds to a partition of 
these yields into s treatment groups in accordance with a particular one of the 
N, Latin Square patterns. The observed result following from the Latin Square 
pattern chosen for the experiment represents a single one of these elements. 

If now, as far as yield is concerned, the s treatments are identical, it will follow 
that each of these N, elements is equally likely to occur owing to the random 
choice of patterns, even if the y’s are not normal or independent. Consequently, as 
in the previous illustrations, it is only necessary to find a rule, applicable to all sets 
of s? yields, which will enable us to separate from the N, elements a suitable class I 
containing a proportion P of them. If this can be done, and the hypothesis of no 
treatment differences is rejected when the experiment performed gives a result 
falling into this class, we shall run a risk equal to P of rejecting the hypothesis H, 
when it is true. 

Exactly as in the simpler examples, many ways might be found of classifying 
N, partitions of the yields, y,;;; the choice between them may be influenced by 
expediency or by the efficiency of the resulting test in detecting the presence of 
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real treatment differences when they exist. From both points of view it seems 
reasonable to employ the usual z-criterion, although as soon as we must depart 
from the original model of the equation above, the fundamental association 
between sums of squares and normal variation is blurred. Acceptir. = this criterion, 
class I will consist of the PN, partitions leading to the largest values of z. 

In his paper (7) published on pp. 21-52 above, B. L. Welch hassuggested a method 
of determining approximately the lower limit of z bounding this class and he finds 
that, if for example P=-05 or -01, this limit does not necessarily correspond 
exactly to the 5 and 1 per cent. significance levels found from the usual tables of 
the z probability integral. Where it falls will in fact depend upon the particular set 
of s* yields, y;;. Thus, in one example taken, as few as 2-8 per cent. and in another as 
few as 2-9 per cent. of the N, partitions obtained by randomization of yields from 
a uniformity trial gave values of z above the normal theory 5 per cent. level. This 
line of approach suggests, therefore, that if we are to obtain a correct probability 
level for z from the classification of the N, partitions, it might be necessary to 
apply a somewhat lengthy procedure to each set of s® yields obtained from an 
experiment. 

The second method of attack is one which, while recognizing that the n’s may 
not be exactly independent or normal, asks how far an analysis of uniformity trial 
data (for which the 7}, in the equation are zero) suggests that the distribution of 
z differs at all seriously from the normal theory form. In this case only a single z is 
obtained from each experiment, and we are concerned with the distribution of z 
resulting from experiments which have actually been carried out, rather than that 
generated hypothetically under randomization when all possible N, partitions 
are obtained from the s? yields of a single experiment. The investigation carried 
out by O. Tedin(s) showed that for certain types of Latin Square pattern the 
distribution of z found in 91 uniformity trials was definitely biased, but for other 
patterns selected at random this bias was not evident. 

It should be noted that even if the assumptions underlying the Latin Square 
equation were perfectly satisfied, there can be little doubt as a result of B. L. 
Welch’s work that certain sets of plot yields will occur in practice from time to 
time which, under randomization, will lead to distributions of z differing from 
normal theory. Some of these distributions, however, would be biased in one way. 
some in another, so that when they are all combined together the resulting z- 
distribution should approach that of normal theory. From each randomization 
set the experimenter is concerned in fact with only one value of z, and this has 
been selected at random if he has chosen his Latin Square pattern randomly; 
consequently from the point of view of his long-run experience, the appropriate 
probability distribution for him to use would appear to be that of normal theory.* 


* Possibly we have here another instance of the difference referred to above between regarding 
a test as giving essentially a rule to be applied and justified by long-run experience, rather than a 
probability measure associated with an isolated experiment. 
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On the other hand if the equation fails to represent the situation commonly 
met with in the field in such a way that there is a general bias in one direction, 
the resulting under (or over) estimate of significance could be avoided by the 
lengthy process of referring the observed z in each case to its appropriate 
randomization distribution. 

To throw further light on these points it would certainly seem to be of interest 
to extend Welch’s investigation by applying his results to further uniformity 
trial data. . 

The conception of randomization illustrated in the examples given above 
is both exceedingly suggestive and often practically useful, but perhaps it should 
be described as a valuable device rather than a fundamental principle. Its adop- 
tion, when it can be followed by the calculation necessary to determine what I 
have described as the class I elements, ensures accuracy in the determination 
of the probability level of a test criterion, but without the aid of some further 
principle it cannot help us to decide which of a number of alternative tests to 
choose. It seems hardly possible to build the methods of statistics into a consistent 
whole without facing squarely the why of that choice. 
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THE DISTRIBUTION OF THE RATIO OF COVARIANCE 
ESTIMATES IN TWO SAMPLES DRAWN FROM 
NORMAL BIVARIATE POPULATIONS 


By H. 0. HIRSCHFELD, 
Harper-Adams Agricultural College, Newport, Shropshire 


Ir is well known that the analysis of variance of a single variable necessitates a 
test of significance, for which Fisher’s z-test is the appropriate solution. However, 
when problems in more than one variable arise, we must consider in addition to 
the separate variances the question of correlation and covariation. For every 
kind of analysis the subdivision of the sum of products of the deviations from the 
respective means into its different components may be performed in exactly the 
same way as the subdivision of the sum of squares, and what is generally known as 
an “analysis of variance and covariance”’ can be worked out easily. 

There are three types of problems for which covariance estimates may be 
used in a test:* 

(i) The question whether there is a difference between the regression co- 
efficients of the two normal populations, from each of which we assume one of the 
samples has been drawn. This test is related to the theory of residual variance in 
an analysis of variance and covariance. 

(ii) The question whether there is a difference between the correlations in the 
two above populations. 

(iii) The question whether there is any difference between corresponding 
second order parameters.* 

In this paper we are mainly interested in question (ii), though the practical 
example to be considered later is an example for both question (i) and question 
(ii). We do not deal with question (iii). 

A difference between two correlations is most conveniently tested by Fisher’s 
z-transformation of the estimated correlation coefficient. One great advantage of 
this test is that it is entirely independent of the values of the population variances, 
i.e. it is valid when nothing whatever is known about the values of the variances. 

There are, however, cases in which the estimated variances allow the assump- 
tion that corresponding population variances are equal. Such information, 
however, which may be derived from the variance estimates is purposely ignored 


* See, however, the paper by E. S. Pearson and S. S. Wilks(6), where several other problems are 
dealt with. 
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when testing correlations estimated from such samples by Fisher’s z-transforma- 
tion. 

Assuming, as we may, that corresponding variances of the populations, from 
which such samples have been drawn, are equal, a difference between their co- 
variances has exactly the same meaning as a difference between their correlations. 

This is the type of problem for which a test for a difference between covari- 
ances has been developed in this paper with the help of the distribution of the 
ratio of covariance estimates.* 

We shall not enter into a detailed discussion of the appropriateness of this 
test. It has been said that the distribution of the ratio of covariance estimates is 
extremely complicated and that it depends on the population parameters. We 
frankly admit that it essentially depends on the value of the population correla- 
tion p, and that there is less scope for its common application than for the test of 
the z-transformation. But it may be regarded as the object of this paper to make a 
test of this kind at all possible by showing that the distribution of the ratio of 
covariance estimates may be developed in such a way that its dependence on p is 
of fairly simple character, and by demonstrating how it may be applied to practical 
examples. 


Thus our problem, well defined by the underlying “null-hypothesis”, may be 
stated as follows: 

Let x;, y; (t=1, 2, ..., n’) and X;, Y¥; (j=1, 2, ..., N’) be two samples both 
drawn from the same normal bivariate population} and let 


v=(n'’—1) (y;—-9), 
V=(N’—1) (X,-X)(%j- ¥) 
j=1 


be the respective estimates of the covariance. We then ask for the chance P (say) 
of drawing two samples of the above sizes from the population such that the ratio 

is greater (or less) than a certain value cy (say). Knowing the value of P for every 
value of cy, we then judge the significance of the observed ratio by substituting it 
for cy and comparing the corresponding value of P with the standard levels of 
05 and -01. 

In this paper we shall find the solution of this problem on the assumption that 
both n’ and N’, the numbers of items in the samples, are odd numbers. This 
restriction, unimportant for practical applications, simplifies the mathematical 


* In terms of the paper by E. S. Pearson and S. S. Wilks(6) the situation may be characterized 
by saying that among the set of all pairs of normal populations z,, 7, with parameters €,,, &,, 
Py ANA Pg Tespectively, for which the relations o,,=o,,, Fy, = %y, ae fulfilled, 
the hypothesis is tested that in addition p,=p,. 


+ Throughout the paper normal populations are called identical if they have equal variances 
and correlations. 
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expression for P considerably, and thus makes it possible to use the latter for a 
test in practical examples, as will be shown in this paper. 

We shall state here the mathematical expression for the chance P, giving the 
proof in a mathematical appendix. This expression is given in terms of complete 
and incomplete B-functions, for which tables have been provided by K. Pearson(). 
Using his notation we define 


B(p.q)=[ 


x 
B;(p,9)= | (1—2yt-tde, 
0 


Ix (p,9q)=Bx (p,q)/B (p,9)- 
We are now ready to write down: 
(a) The chance P+ (p) that two independent samples, viz. 
%;,Y;(¢=1, 2, ..., n’ = 2k +3), 
X;, Y;(j=1, 2, ..., N’ =2K +3), 
which have been drawn from the same normal population, with correlation co- 


efficient p, yield estimates of covariance whose ratio is greater than a certain 
positive value c,. We have 


P+ 4-k-K-2),-1K 
x (1 {(1 1+ (1+p)-?-4, ...... (1) 
where X =(K + 1)/{(K +1) (k+ 1}. 


(6) The chance P~-(p) that the above samples yield estimates of covariance 
whose ratio is smaller than a certain negative value cy. 


k+1K+1 
P-(p)=4* YY (k,k+2—g) x B(K, K+-2—p)] 


q=1 p=1 
x (1 
x {(1—p) (1+ p)1 x, (p,q) + (1 +p)? (l—p)*1 x, (2, Q)} (2) 
where X,=(K+1)(1-p)/{(K +1) (1—p) + +1) (1+p)} 


X_=(K +1) (1+p)/{(K +1) (1+p)+¢9(k + 1) (1—p)}. 

The expressions for P(p) are thus finite weighted sums of incomplete 
B-functions, the weights being complete B-functions and simple polynomials in p*. 
For p=0 the chances (1) and (2) are identical, i.e. the distribution is symmetrical 
in cy. As p? tends to | (i.e. as p tends to + 1 or to — 1) equation (1) approaches the 
simplified form 


* For integral values of p and q. 
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which is the representation of Fisher’s z-test in terms of Pearson’s incomplete 
B-functions. Thus, whenever it is known that there is a very high correlation 
between the variates x and y, covariance estimates might approximately be 
tested like mean squares by Fisher’s z-test. 

On the other hand, since expression (2) approaches 0 as p?—> 1, negative values 
of cy are less and less likely to occur as p” increases and there will be but few prob- 
lems with reasonably correlated variates, in which (2) has to be used for a test. 

From formulae (1) and (2) it is clear that the chance P essentially depends on 
the correlation coefficient p of the population, which is unknown. This property, 
disadvantageous for a test of significance, is obviously characteristic for the 
nature of our problem. To demonstrate this dependence eleven different p-values 
covering the interval —1<p< +1 were chosen, viz. the values p=0, +-2, +°4, 

+-6, +-8, +1, and for these values were calculated the chances P (p) (given by 
equation (1)) of obtaining two samples of 15 (k = K =6) whose ratio of covariance 
estimates is greater than 74/26.* 

The result is given below in Table I. 


Ratio of Covariance Estimates 


TABLE I 
Giving the chance P(p) of drawing tworandom samples of 15 from anormal population 


with correlation coefficient p such that the ratio of their covariance estimates is 
greater than 74/26 


P(p)= 110 128 135 097 054 030 


From Table I it is obvious that there is no hope of approximating to our 
distribution (or to a transformation of it) by a normal curve (or any suitable 
distribution function) independent of p. Therefore, in testing significance we must 
admit all possible values of p, as we have started to do in Table I. Thus three 
different types of results may occur: 


(a) For all values of p the P-values are smaller than -05 (significant at 5 per 
cent.). 


(6) For all values of p the P-values are greater than -05 (insignificant at 5 per 
cent.). 
(c) For some values of p the P-values are smaller than -05, for other p-values 
the P-values are greater than -05. The former p-values will cover aninterval 
(or a set of intervals) on —1<p< +1, which may be called J,, the latter 
p-values will cover the remaining part of the range —1<p< +1, which 
may be called /,. 
Table I shows an example of type (c), J, being (approximately) the intervals 
*8<|p| <1, J, being the interval 0<|p|<-8. To complete the test in cases like 


* The value 74/26 has been chosen in connexion with a practical example to be discussed later. 
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this, the best estimate r of the correlation coefficient p has to be calculated from 
the samples, and the deviation of this observed value r from the nearest value in 
I,, p’ (say), taken as population value, has to be examined. In case this deviation 
is insignificant (at 5 per cent.), the whole test returns an insignificant result. For 
it has failed to disprove the hypothesis that both samples have been drawn from 
the same population with correlation coefficient p’. If, on the other hand, the 
deviation is significant, so is our original test. For, whatever the value of p may be, 
the original “‘null-hypothesis” has been disproved. 

This latter part of the test is most easily worked out by Fisher’s approximate 
method ((2), § 35) or by entering the exact distribution of r with p’ as population 
value,* e.g.(1), p. XXxXviii. 

The first and main part of our test, however, consists in calculating the above 
chance P (p) (see equations (1) and (2)), or rather in finding among all p-values, 
for which P (p) = -05 (i.e. among all p-values in J,, if any) that value p’, which is 
nearest to our observed correlation coefficient. 

To facilitate this, trivariate tables for the 5 per cent. points of the distribution 
of c (or a suitable transformation of it) would have to be worked out. The best 
arrangement of these would be in such a way, that two-way tables with the 
number of items in the larger sample as row-headings and (say) twenty different, 
positive p-values as column headings should proceed in pages with the number of 
items in the smaller sample.t 

But even without the aid of such tables there is a method of working out the 
test for practical examples with the help of Pearson’s tables of the incomplete 
B-function, and the calculations for such an example are shown below. Since 
Pearson’s tables have been prepared so as to answer various other purposes, much 
calculation work is still left to be done, when applying them to our test. Though a 
further table (Table IT) has been prepared, which facilitates the work consider- 
ably, the following method, unless both samples are very small, is still too 
laborious to become a ~ommon statistical practice. 


PRACTICAL EXAMPLE 


In a Cambridge nutrition experiment on pigs), among many other post- 
slaughter results the “‘mean back fat” (x) and the “percentage fat from back to 
belly” (y) were measured for 15 hogs and 15 gilts. One problem was to see how far 
mean back fat, which measurement is taken without great difficulty, provides a 
fair estimate of the percentage fat from back to belly and thus may be used for 
grading purposes. 

We do not reproduce the 30+ 30 measurements here, but on examination it 
would be noted that the relationship was fairly marked for hogs but not so for 


* In this case it might become necessary to perform the test for two p’ values, viz. the nearest 
p-value in J, with p<r (if any) and the nearest p-value in J, with p>r (if any). 

+ The author regrets that, at the moment, he is unable to undertake this work, since he has 
only an adding machine at his disposal. 
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gilts. This sex difference is confirmed by the values of the respective sums of 
squares and products given below: 


(x*) (xy) (y*) 
Hogs 1-0975 10-488 | 
Gilts 0-6544 3-527 


Testing the significance of a correlation between y and 2 (or of a regression of 
y on x) for hogs and gilts separately a highly significant result is obtained for hogs 
whilst the gilts regression is quite insignificant. Nevertheless, we shall see that 
the difference between these relationships cannot be regarded as being significant. 
We shall test both the difference between correlation estimates (with the help of 
Fisher’s z-transformation) and the difference between regression estimates (by 
the t-test), for in this example both questions are of interest. Finally, since 
variance estimates allow the assumption of equal population variances for hogs 
and gilts, we shall compare these tests with the test for the ratio of covariance 
estimates. 

Let us start with the t-test. Doing this we obtain for 6, and b, the respective 
values by=9'56, by =5°39. 


Furthermore, the estimated s.D. of the difference b, —b, has to be calculated in 
the usual way, the work being shown below: 


Residual sum of squares (hogs)= 135-83 
‘ ” ” (gilts) = 283-57 


26x s?= 419-40 
16-131 
s?/1-0975= 14-70 
8*/ -6544= 24-65 


39-35 
s.D. of (by —bg) =V 39-35 
t~ +665 


The ¢-test returns an altogether insignificant difference; the chance of obtaining a 
difference equal to or larger than that observed being greater than 0-5. 

Next we consider Fisher’s test for the difference z,—z., i.e. the difference 
between the z-transformations of the estimated correlations 7, and r,. We obtain 
(approx.): 


7r,=°65, 2,=°78) .. 
diff. = -53; 
s.D. of diff. = +408; 


normal deviate = 1-3. 
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Again we obtain an insignificant result. The chance of obtaining a difference 
between correlation estimates larger than that observed is abcut 0-2. 

The ratio of the covariance estimates, however, viz. that of the hogs divided 
by that of the gilts, is nearly 3, and since we know that our test approaches the 
z-test as p? approaches 1, this ratio will be significant for large p?. The actual 
result of our test is summarized in Table I showing significance at 5 per ceni. for 
<|p| <1 (approx.) and insignificance for 0 < | p| <-8.* 

Calculating now from equation (3) the best estimate of our correlation co- 
efficient common to hogs and gilts, we obtain 


r=-46. 


This value lies right inside our J, interval and thus, having no deviation from the 
“nearest” p-value in J,, returns our ratio ¢, as insignificant. 

Though we obtain the same result as with the usual tests, it is obvious that in 
this example our test is more sensitive to the difference between the above 
covariance estimates. For our largest P-value is about 0-14 whilst the regression 
test yielded a P greater than 0-5 and Fisher’s test of the z-transformation a 
P-value of about 0-2. 

An explanation of the calculations, on which Table I is based, follows: We 
have to compute the value of P* (p) (given by equation (1)) for k= K =6, p=0, 
+°2, +°4, +-6, +-8, +1 and cy=74/26.7 

We first transform formula (1) into a form which is more suitable for its 
computation, whenever k= K. 

Introducing the abbreviation 


C (kK) x (k, k+ 2—q) B(K, 
x (1 {(1 —p)-P-4 4 (1+ p)-?-4}, 


k+1K+1 
we have P+ (p)= Ix (p.9) 
q=1 p= 
Now, since the J-function is only tabulated for p= q, we write 
k+1K+1 k+1q-1 
q=1 p=a q=2 p= 


But since 
Ix k=K and thus C (p,q; p)=C (9, p; p), 


we may write instead of equation (4) 


k+1k+-1 k 


k 
q=1p= 


+1 
>> {1 (p,q)} C (p, q3 Pp), 
q+1 


* The value 74/26 shown in Table I is slightly smaller than the actual value of cy observed, viz. 
2-974. This has been done to simplify the work and will be explained later. 
+ See the footnote to p. 68. 
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and finally we arrive at 


q=1 B(k,k+2—q) B(k,k+2—p) 


k 
2 
Q(e + 2B (kk+2—q) 


where 


Q p+q)={(1—p) 4+ (1 +p)? 4. 


1 


k+1 
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k+1 
x 


Qp+4 
B(k,k+2—p) Q(p?,p +9) 


These two sums are now most conveniently worked out together. The first 


step consists in preparing a table of the values 


rather than giving entries accurate up to a certain decimal. 


This table, because independent of cy, may be used by the reader for similar 
tests, provided the number of observations, when added for the two samples, is 
not greater than 30. The accuracy which is required for the entries of this table 
depends on the size of the sample, and (especially for high values of p) increases 
considerably as p+q increases. Therefore a 5-figure table has been prepared 


TABLE II 
Values of (1—p)-?-4+ (1+ 
(6) (c) (d) (e) 
(p+q) p=2 p=4 p='8 

2-2569 3-2880 6-6406 2-5309 x 10 
2-5318 4-994] 1-5869 x 10 1-2517 x 102 
‘ea. 2-9237 7-9764 3-9215+ x 10 6-2510 x 102 
| 65 3-4536 1-3046 x 10 9-7752 x 10 3-1251 x 10° 
ee. 41496 2-1566 x 10 2-4420 x 102 1-5625 x 10* 
| 7 5-0475- 3-5817 x 10 6-1039 x 102 

8 6-1930 5-9605+ x 10 1-5259 x 103 
76444 9-9277 x 10 38147 x 10° 
| 10 9-4747 1-6542 x 10° 9-5368 x 108 

11 1-1776 x 10 2-7566 x 102 2-3842 x 104 “ 
| 12 1-4664 x 10 4-5941 x 10? 5-9605- x 104 

13 1-8283 x 10 7-6567 x 102 1-4901 x 105 
| 14 2-2815+ x 10 1-2761 x 10° 37253 x 10° 


Next we write down the values of [B (6,8—q)]~ for q=1, ..., 7, which are 
easily obtained from their definition, viz. 


Finally, we prepare with the help of Pearson’s tables of the incomplete 


[B (6, 


B-function a table of the values 


2”+4(B (6, 8—p)] (p,q) + ~)} for p>. 
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TABLE III 
Values of 1/B (6, 8—gq) 
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q 


1 
5544 


2 
2772 


3 
1260 


| 504 


5 
168 


6 
42 


In choosing -26 for X we have substituted for cy a value whiclris slightly smaller 
than the ratio actually observed, in order to coincide with an entry in Pearson’s 
table. This we have done to save the reader the work of about 56 interpolations. 
The diagonal entries of the table are the values 2?”[B (6, 8—p)]— x Log (p, p), since 
these are required for the further calculations. 


TABLE IV 
alues of ———_——_ {J or p>q and —————__ , p) for p= 
B(6,8—p) 1 — for p>q and (P,P) for p=4 
< 
a 1 2 3 4 5 6 7 (*) 
Pp 

1 5,765-8 

2 11,531-5 7,435-5 

3 12,345-0 13,519-1 , 9220-6 

+ 11,365-5 13,107-9 14,752-9 10,342-0 

5 8,378-9 10,649: 1 12,040-7 13,789-4 9,829-0 

6 4,494-9 6,250-3 7,564-2 8,478 9,829-0 7,085-8 

7 1,349-5 2,020-9 2,636-2 3,101-5 3,461-4 4,049-0 2,943-2 
(a) p= 0} 11,046 10 | 10,597 x10 | 9,243 x10 | 7,142x10 | 4,624 10 | 2,227x 10 | 5,886 1,067 x 10° 
(b) p=-2| 1,834 2,307 x 10? | 2,701 x 10 | 2,863 178x 10°} 67x 108 | 219x107 
(c) p=-4 75x 1,473 x 10° | 2,632 x 10? | 4,243 x 636x10*| 38x10® | 150x108 
(d) 87x 2,856 836 x 10° | 2,232 x 105 534 x 108 | 1,026 110 x 107 48 x 
(e) p=°8 105 x 10? 75x 108| 465x108} 265x10®| 1,421 x 10° | 6,673 x 10° | 1,796 x 10° 85 x 10% 

The rest of the work is obvious from formula (5) and is shown in the above 
table. 


In rows (a), (6), (c), (d) and (e) are given weighted totals of the respective 
columns in the upper part of Table IV. To calculate the weighted column totals 
which are shown in row (b) (say) we multiply the entry in the pth row and in the 
qth column of Table IV by that entry of Table IT which is shown in the (p+q)th 
row and in the column (b). Then these products are summed to yield the weighted 
column totals in row (b) of Table IV. Similarly, for the rows (c), (d) and (e). The 
entries of row (a) are simply obtained by doubling the column totals of Table IV. 

Finally the gth entry of each of the rows (a), (b), (c), (d) and (e) in Table IV is 
multiplied by the gth entry of Table ITT and these products summed for each row, 


+ The great loss of accuracy is due to the small scope of an ordinary adding machine, with 
which the work had to be done. 
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the respective sums of products being shown in column (*). If these sums of 
products are divided by the respective values of 36 x 414 x (1 — p?)-"4, the chances 
P* (p) of Table I are obtained. 

My heartiest thanks are due to Dr J. Wishart for his help and advice through- 
out this work and to Mr F. J. Dudley for his help in preparing Table IT. 


APPENDIX 
Derivation of the distribution function for the ratio of covariance estimates 


(i) In this part we shall give a mathematical proof of the formulae (1) and (2), 
which have been used for a practical test in the preceding part of the paper. 
Incidentally we shall derive the theory for a more general problem and point out 
further properties of our distribution, which are analogous to the well-known 
behaviour of the z-distribution. 


The problem may be stated in its generalized form straightaway: 


Let (¢=1, 32, ..., n' == 2k +3) 
and X;,Y; (j=1,2, ..., W’ =2K +3) 
be two (independent) random samples drawn from the populations 
f (x, y) =[(aB — x exp {— (oa? + 2vay+By?)} (6) 
and F(X, Y)=[(AB— N?)!/z] x ...... (7) 


respectively. 


It is then required to obtain the distribution function for the ratio of covariance 
estimates 


n’ N’ 


where by %, y, X, Y we denote the respective arithmetic means of the samples, 
viz. 
n’ 1 
x; 
i? 
i=1 


In the course of the proof, the ratio of the “‘sums of products”, 


N’ 
i= j= 
will turn out to be a more convenient statistic than c itself. 


+ Instead of the parameters «, f, v the quantities o,, o,, p (i.e. the standard deviations of x and 
of y and the correlation between x and y) are more commonly used to represent a normal popula. 
tion. In terms of o,, og, p the parameters «, B, v are defined by 


v= 
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(ii) Our first step is to show that the distribution of the sum of products 


(9) 


may be expressed as a finite sum of elementary functions, provided n’, the number 


of items in the sample, is odd. If we introduce new variates £, 7 by the linear 
transformation 


t=V } 

then by substituting (10) in (6) we obtain for the distribution of £, 

[2 (aB — v®)/ar] x exp {— (28+ 2vV B/a) (2a —2vVx/B) «....- (11) 


Now to any sample of n’ pairs x;, y; there will correspond a sample of n 
pairs £;, »; obtained by the converse of (10). Furthermore, obviously 


u=V a/B 


From equation (11) it is obvious that the variates €, y are independently 
distributed. Hence the joint distribution of y,,, yoo (see e.g. (4)) is given by 


x exp {— (28 + 2vV B/a) yy, — (20 — «/B) 


where k = (n’ — 3)/2 is an integer > 1. Thus by equation (12) we arrive at the joint 
distribution of u and yop, viz. 


X = 4* +2 — (Ie 1)-2 exp { —(2V + 2v) u} 


x (W+-Va/B exp {— 4a (13) 
To obtain the distribution function g(u) (say) of the sum of products we 


have to integrate x (w, yo) (see equation (13)) over the range of y... This range, 
however, depends on w. For since the range of the variates y,,, 722 is given by 
<0, OS 


the range for w, yo. must be 


—O<U< +00. 


We first consider the case uo. 


In this case we have to integrate y (w, 722) for ya. ranging from 0 to +00. To do this 


Lika 
oa 
ve 
: 
(12) 
ah 
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we apply partial integration (2k + 1) times, integrating exp {— 4ay.9}, differentiat- 
ing the powers of y., and collecting the terms at y,.=0. We thus obtain 

gt (1) = exp { — (2VaeB + 2v) u} 
k !k! 
(k+7)!k! 


7=0(k—7)! 7! 


(V 
which may be written as _ 
g* (wu) = (aB — exp {—(2VaB + 2v) u} (VaB) 
k 
* =, 
We next consider the case uso. 


(14) 


Now we have to integrate for y.. ranging from —wuV B/« to +00. Integrating by 
parts as above and collecting the terms at y..= —uV B/« we have 
g~ (u) = — exp {(2VaB — 2v) u} (Wa 


k (k+7)!(4VaB)7 
kir!(k—7)! 


The required distribution function for wu is thus given by 


(—u)F. (15) 


= (u) for (see equation (14)) 
g (u) for (see equation 


Comparing these with the “Bessel-function distribution” of u 6), 
we incidentally have proved the well-known fact that the Bessel-function K, (x), 
for fractional v, can be expressed by elementary functions in a finite form. 
(iii) It is now easy to see how the distribution of the ratio of the two sums of 
products 
w= =e[(k+ 1)/(K +1)] 


can be derived from equations (14) and (15) by elementary integrations. For if 
we consider the sum of products 


U= (X,-X)-¥), 


the sample of which has been drawn from the population (7), then the distribution 
of U (say G(U)) is obtained by replacing in equations (14), (15) and (16) the 
letters wu, «, B, v, g, k=(n’—3)/2 by U, A, B, N, G, K=(N’ —3)/2, respectively, 
Hence, independence of the samples (x;,y;) and (X;,Y;) being assumed, the 
joint distribution of wu and U is equal to g(u) x G@(U) and thus the distribution 
function of w (say ® (w)) is given by 


©(w)= |" “9 (wv) (17) 


= 
1 
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since the modulus of the Jacobian of the transformation 
u=wU, U=U 
is simply equal to | U|. To work out equation (17) we first consider the case 
w20. 


We now may write 


(w) = g- (wU) G-(U) vau+| (wU) G+ (U) UaU 
—o 0 


= (w) + T, (w) (say), 
and start to consider 7', (w). Now by equation (14) plainly 


Uk+K-1-#+1 exp {—2 [(VaB+v)w+(VAB+N)] U}dU, 
0 


where the last integral, by partial integration, is seen to be equal to 
(k+K—7—p+1)![2{(V a8 + v) w+ AB + 2, 


If then 7; (w) is worked out in the same way with the help of equation (15) we 
finally have 


(w) = [(@B — »*)/ VaR} [(AB — N2)/V 


(+r)! (K+p)! 


x (k+ 1)! wk {[( Va 8 + v) w+ (WV AB + 
+[(VaB —v) w+ (WV (18) 
or, introducing «= k—7 and j= K —p as summation indices, 
(w) [(AB — N2)/-V (kK!) 
k (2k—i)! (2K —j)! (2VaB)-*+# AB)-*+ 


(2V AB)-# 2-*-K-2 


ot! (k—t)! j!(K—j)! Qk+K+2 (¢+j+1)! 
x wi{[(VaB + v) w+ (WAB + + [(VaB — v) w+ 
(19) 
Turning now to the case w<0 


we obtain by the same argument as above for 


o- (w)= @-(0)| | (wo) (0) UdU 


- \ 
~ 
Be 
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the sum 
(w) = [(aB — v*)/ [(AB— N®)/V 
(2k—i)! K (2K —j)! (2VaB)-*+é (2 


x (—w)! {[(VaB +») (—w) + (WV 
+[(WaB (20) 
(iv) To use equations (19).and (20) for a test of significance we confine our- 


selves to the most important special case, viz. we assume that the populations (6) 
and (7) are identical, whence 


We first consider the distribution ® (w) for 
w20. 
Using equation (21) and introducing the correlation coefficient 
p= —v/VaB, 
equation (19) may be written as 
(w)= (2k—i)! (2K—J)! (¢+j+1)! wt 


Q-2k-2K+i+j—2 (1 — —p)itit2 (1 + (22) 
Thus ®+(w) is seen to depend on k = (n’ — 3)/2, K = (N’ —3)/2and p?. Furthermore, 
as p?—> 1, ! 

wk 


and this is the distribution function for w=[(k + 1)/(K + 1)]e*”. 
To prove formula (1) we have to determine the probability integral 


P+= ®+(w)dw for any W20. 
Ww 


But since | qa = -| (1 dw 
w 
-| wi (l—w)*dw, 
0 


we see from equation (22) that we can express our probability integral with the 
help of the incomplete B-functions. Using the notation of p. 67 we obtain 


| (w) dw = 4-#-K-2 K-1 
Ww 


k+1K+1 I 
x (1 (23) 


which is equivalent to formula (1). 


* In order to derive formulae (1) and (2) it would be sufficient to assume that for the populations 
(6) and (7) the equations v=N and «8=AB are fulfilled. 
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A further remark may be added: 
If the two independent samples had been drawn from populations having the 
same correlation coefficient but unequal variances (i.e. if in equations (6) and (7) 
v/VaB = N/V AB= —p), then it is easy to see that again test (23) is valid with w 
and W replaced by w’=w(VaB/V AB) and W’=W(Vaf/V AB) respectively.* 
This is analogous to the well-known property of the e* distribution. $h 
We now turn to the case of negative ratios w: 


ws0. 
From equation (20) we obtain under assumption (21) for the distribution of 
negative values of w 


(—w)' (—w)* 
x (1 — 


It is easy to see that, as p? > 1, ®- (w) > 0 uniformly in any finite interval of non- 
positive w-values. 


Finally, we obtain by elementary transformations for any W =0 


P d 4-k-K-2 kK)" k+1K+1 Qp+4 


x [a —py? (1 (sw 222) ] 


+(1+p) (1 


which is equivalent to formula (2). 


If the two independent samples had been drawn from populations with 


different variances but equal correlation coefficients, then what has been stated 
about P+ also applies to P-. 


* Thus the calculation of P+ (p) only differs from that described previously in that a different 
row of Pearson’s table has to be entered. Should it, however, become necessary to calculate the best 


estimate of p, it has to be remembered that now the populations (6) and (7) may have different 
variances. 
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A CONTRIBUTION TO THE BIOMETRIC STUDY OF 
THE HUMAN MANDIBLE 


By FRANK H. CLEAVER, M.A. 
Crewdson Benington Student in Craniometry 


1. Introduction. It is clear to-day that the statistical study of anthropometric 
data with the object of investigating racial origins and relationships requires 
numbers of subjects, or specimens, far in excess of those considered sufficient by 
earlier anthropologists. The newer methods also demand greater precision in 
measurement, and better control and standardization of the techniques used in 
collecting the data. There is far more metrical material available for the cranium 
than for any other part of the skeleton, and the results for it are far in advance of 
those for any other kind of anthropometric material. The available measurements 
of living series, though more extensive, are unfortunately of lesser value owing 
to the fact that there has been no effective control or standardization of the 
techniques used in determining them. Owing largely to recent work presented in 
papers in Biometrika, the mandible is now the part of the skeleton, after the 
cranium, which has been best described metrically. The present paper provides 
statistical data for four new male and three corresponding female series, viz. two 
English, a Punjabi (male only) and an Australian. In all there are now 17 male 
and 9 female series measured by following the same biometric technique, though 
it is clear that some of these are too small to be of permanent value by themselves. 
The statistical treatment applied here is the same as that of the earlier papers, 
and particular attention is paid to a discussion of the coefficients of racial 
likeness. It was not to be expected that the conditions which have to be fulfilled 
in interpreting these criteria of resemblance would be precisely the same for 
mandibular as for cranial material. There are clear differences between the two 
kinds of evidence in this respect, and a general conclusion reached is that measure- 
ments of more and longer series of mandibles «re still needed in order to discover 
how far the bone is capable of revealing racial relationships. It appears to be less 
effective for this purpose than the cranium. 

2. Description of the material. Original data relating to four series of mandibles 
are presented in this paper. The material consists of two London series (from 
Spitalfields and Farringdon Street), a Punjabi and a native Australian. The first 
two of these are preserved at University College, London, and permission to 
work on them was kindly granted by the late Professor Karl Pearson; the third 
and fourth, preserved in the Museum of the Royal College of Surgeons of England, 
were loaned by the College authorities, and I should like to take this opportunity 
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of thanking them for their kindness, and the staff of the College for the considera- 
tion and assistance I at all times received from them. 

(a) The Spitalfields skeletons were dug up in 1926,* and the nature of the 
interment in roughly circular pits, without any orderly arrangement of the bodies, 
pointed to a mass burial, the result of plague, massacre, or some such catastrophe. 
Examination of the crania showed that they are racially homogeneous, while 
comparisons between the crania of this and of other series (using the method of 
the coefficient of racial likeness) have indicated that the Spitalfields series 
lies closer to Pompeians, Etruscans and the population interred in the Church of 
St Leonard, Hythe, than it does to any other European series available. The type 
is far removed from those of the Neolithic, Bronze Age and Anglo-Saxon popula- 
tions of England, as well as from seventeenth-century crania excavated at 
Whitechapel aud Farringdon Street. The Spitalfields interment is therefore one 
of an intrusive population, and in the absence of datable artifacts, and of any other 
direct archaeological evidence, it has been concluded that it took place either in 
mediaeval or Roman times, this assumption being based mainly on an examina- 
tion of the history of the Spitalfields site. The measured series is made up by 63 
male and 32 female adult mandibles, only 12 of these being associated with crania. 
There are 195 other mandibles from Spitalfields which were not measured, either 
because they are immature, or else because they are too fragmentary for the 
purpose. Nearly 1000 crania from the site were preserved—the majority of these 
being incomplete—and it is estimated that about 3000 people were buried in the 
excavated area. 

(b) The Farringdon Street skeletons, of which the mandibles discussed in the 
present paper form part, were dug up in 1924. A detailed account of the evidence 
for dating the bones was prepared by Professor Karl Pearson.} As a result of his 
examination it is safe to say that the interment of the Farringdon Street skeletons 
took place during the period 1610-1722 in the graveyard of the Parish Church of 
St Bride, but that the majority of the interments were made between 1610 and 
1666 and were mainly the results of deaths from the Great Plague, 1665. Miss 
Beatrix G. E. Hooke measured over 350 of the Farringdon Street crania and 
67 of the mandibles. The measurements of the unsexed mandibles were published 
in her paper, “A Third Study of the English Skull with special reference to the 
Farringdon Street Crania”’.{ She states that several hundred mandibles were dug 
up, none being attached to skulls, but that the incomplete condition of these 
bones, due to breakages, prevented the taking of a fairly complete set of measure- 
ments except on 67. The present writer re-examined the collection of Farringdon 
Street mandibles and, bearing in mind certain requirements necessary for pur- 
poses of sexing (a more detailed account of which will be found in another section 


* G. M. Morant and M. F. Hoadley, “‘A Study of the Recently Excavated Spitalfields Crania”, 
Biometrika, xxui (1931), pp. 191-248. 

+ Biometrika, xvm (1926), pp. 1-15. 

t Ibid. pp. 1-55. 
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of this paper), he was able to pick out and measure 90, i.e. 23 more than Miss 
Hooke measured. This difference can possibly be ascribed to the fact that Miss 
Hocke chose for measurement only those mandibles on which all, or nearly all, 
the 35 measurements used at that time could be taken, while the present writer 
used only 16 of those measurements selected as being the most reliable (see 
section 3 below). 

(c) The Australian mandibles dealt with in this paper are those in the Museum 
collections of the Royal College of Surgeons. These specimens were obtained 
from several sources at different times, and there are no series of any length 
among them from single burial-grounds. They are divided in the Museum 
catalogue according to a scheme of classification based on the modern State 
territories, but for purposes of statistical treatment they have been pooled, with 
the exception of five from the Northern Territory. The justification for this 
procedure is that a statistical examination of about 300 Australian crania 
collected from all over the continent suggests that only two racial divisions can 
be recognized—that from the Nerthern Territory, where immigration is most 
likely to have affected the type, and that which is spread over the enormous area 
of the rest of Australia.* There is, as would be expected, a close connection 
between these two groups. An examination of the cranial facial skeleton of the 
two racial groups distinguished above revealed no significant differences between 
them, and hence it would have been of considerable interest to compare the 
Northern Territory mandibles with those from the remaining area. Unfortunately, 
no adequate numbers were available for the former group which was demarcated 
from the remainder in accordance with the cranial evidence mentioned above. 
Of this remainder the males were first divided into two sets. The first comprises 
those from Western Australia (3), South Australia (17) and Victoria (16), 
and the second those from Queensland (14) and New South Wales (9). This 
was a purely arbitrary division based on geographical position, and a com- 
parison of the two groups revealed no significant differences at all between them. 
Pooling is hence justified as far as can be seen from this evidence. Nine male 
mandibles from unknown localities were included in the pooled series. There are 
36 female adult Australian mandibles: 2 from Western Australia, 12 from South 
Australia, 4 from Victoria, 9 from Queensland, 6 from New South Wales and 3 from 
unknown localities. 

(d) The series described as Punjabi in this paper comprises those mandibles 
catalogued as such at the Royal College of Surgeons, and it is made up almost 
entirely of male mandibles from the collection which Sir Havelock Charles 


* G. M. Morant, “‘A Study of the Australian and Tasmanian Skulls, based on previously 
published Measurements”, Biometrika, x1x (1927), pp. 417-40. 

+ The coefficient of racial likeness based on cranial measurements for these two groups is 
1-58 + -21 for the male and -69+-21 for the female series (Morant, loc. cit. p. 424). No great reliance 
can be placed on any generalizations concerning the racial composition of the whole of the Aus- 
tralian continent in view of the scanty nature of the material available. 
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presented to the Museum of the College. They belonged for the most part to 
inmates of the British Hospital at Lahore, where Sir Havelock Charles was a 
surgeon, and as such they cannot really be considered as a random sample of the 
population of the Punjab. The few other mandibles included in the series are said 
to have come from various parts of the Punjab, and the specimens of the whole 
collection are variously catalogued as Sikh, Jat, Pathan, etc., ete. The main 
scheme of classification, however, distinguishes two groups—Hindu and Moham- 
medan—the basis of distinction being thus religious and not ethnological. In 
the Punjab, besides Sikhs and Pathan immigrants from across the frontier—both 
Mohammedan conquering stocks—there are the Mohammedan converts. The 
religion of Islam seems to have taken a firm hold on the native population, and, 
judging from census returns, large numbers of Jats, Rajputs and Gujars were of 
the Mohammedai faith. Just as it is confidently asserted that in Bengal the 
Mohammedans are of the same racial type as the lowest castes of Hindus, so in 
the Punjab the former are not clearly distinguished from the Hindus, though the 
religious divisions appear to be of significance from a racial point of view as will 
be shown below. The British Hospital at Lahore would no doubt have admitted 
the Sikhs, the Pathans, the descendants of the old Rajput rulers, the Jat peasantry 
and perhaps even some of the nomad Baloches, who are supposed to be of a 
distinctive physical type. The present sample cannot be considered a racially 
homogeneous one, and it must be considered from the racial standpoint only as 
representing an Indian type, in contrast to the European and Australian types 
also dealt with in this paper. Such lack of homogeneity was evidenced when the 
sample itself was divided into two groups—Mohammedan and Hindu. A com- 
parison for all characters between the male groups—made up by 27 and 22 man- 
dibles respectively—shows that differences exceeding 3-5 times their probable 
errors are found only for ml (A/(p.e. A) = 6-4) and C’ Z (4-6) out of the 21 characters 
compared. The. coefficient of racial likeness between these two series was cal- 
culated for 10 characters,* giving a crude value of 1-61 +-30 and a reduced of 
7-47 + 1-40. There is no doubt that the total Punjabi series is racially hetero- 
geneous, but division by religion may not be the best possible, and for practical 
purposes it seemed advisable to use the total group, which is far from an ideal 
procedure. The fact that in this paper the pooled sample has been treated like 
a homogeneous series may be partly responsible for the unsatisfactory results 
(given below) which are found when racial comparisons are made between this 
series from the Punjab and the Farringdon Street and other series. In all 49 male 
adult, 9 female adult and 2 immature Punjabi mandibles were measured. 

A list of all the previous series of mandibles on which measurements have been 
taken in accordance with the biometric technique is given by G. M. Morant.t 


* A list of the characters used is given in a footnote to Table X and the Qau Egyptian 
standard deviations were used in calculating the coefficient. 
+ Biometrika, xxvim (1936), pp. 92-4. 
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He refers to 2 European, 4 Egyptian, and 6 Asiatic series. E. 8. Martin has pro- 
vided similar data for another ancient Egyptian series.* Comparisons between 
the four series dealt with for the first time in the present paper and the earlier 
material are made below.+ 

3. Definitions of measurements and estimates of their accuracy. The biometric 
technique for measuring the human mandible was given by G. M. Morant in 
Biometrika, xtv, pp. 253-60, in 1923. Only those measurements included in a 
revised list, { and chosen chiefly because they could be taken with greater accuracy 
than those in the original list, have been used. These are: 

w,. Maximum breadth outside condyles avoiding excrescences on these 
processes. This maximum projection may be taken in any direction and 
it is not necessarily horizontal or transverse. 

c,l. Maximum projective length of the left condyle avoiding excrescences on 
these processes. This may be taken in any direction. 

rb’. Minimum antero-posterior breadth of the left ramus at any inclination 
to the horizontal, but with the posterior terminal never less than 13 mm. 
distant from the gonion. 

m,p,. Chord between the points on the outer left alveolar margin at the middle 
of the second molar (or its cavity) and at the middle of the first premolar 
(or its cavity). 

h,. Symphyseal height from intradental to the point farthest removed from 
it in the symphyseal plane, this plane being determined by anatomical 
appreciation. 

zz. Minimum chord between the anterior margins of the right and left 
foramina mentalia. 

c,¢,. Coronial breadth from right coronion to left coronion. If both condyles 
are missing, the coronia (the tips of the coronoid processes) cannot be 
located with sufficient accuracy to justify the measurement being taken. 

The above seven are caliper measurements and all those below, except the 

last, are taken with the aid of a mandible board of which photographs are given 

in the paper describing the technique. 

MZ. Mandibular angle, i.e. the angle between the standard horizontal and 
standard rameal planes. 

c,l. The projective length of the corpus. 

rl. The projective length of the left ramus. 

* “A Study of an Egyptian Series of Mandibles, with Special Reference to Mathematical 
Methods of Sexing,’’ Biometrika, xxvut (1936), pp. 149-78. 

+ Comparisons have not been made with the Anglo-Saxon series published by J. C. Brash, 
Doris Layard and Matthew Young (ibid. xxv (1935), pp. 398-404) as the constants for it were not 
available at the time when the calculation for the present paper was carried out. 


} The technique is described and full definitions of the measurements finally adopted are given 
in the Appendix of his 1936 paper cited. 
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ml. The maximum projective length of the mandible. Both condyles make 
contact with the vertical rameal wing of the board, and the solid set-square 
makes contact with the most advanced point of the chin. 

c,h. Projective height of the left coronoid process. 


mh. Projective height of the corpus at the middle point of the outer alveolar 
margin of the second left molar. 


RZ. Angle of condylar-coronoidal line with ramus tangent. If either the 
condylar or coronoid process is defective on the left side, then the mandible 
is positioned from the right side. 1 

JoJo. Chord from left gonion to right gonion, found with small calipers, the 
mandible board being used to locate the gonia. 


The last measurement is taken with a goniometer. 


C’Z. Mental angle, i.e. the angle between the standard horizontal plane ard 
the line joining the infradental to the most anterior point in the standard 
sagittal plane of the symphysis (pogonion). The infradental is defined to 
be the mid-point of the common tangent to the two curves of the outer 
alveolar margins of the central incisors. 


The question of personal equation in regard to the characters used in this 
paper was investigated by G. M. Morant in his paper “A Biometric Study of the 
Human Mandible”’.* The choice of characters recommended in that paper, 
chiefly on the ground that they had been shown to be the most reliable of all 
the characters originally defined in the technique, was accepted by the present 
writer who followed the amplified definitions given in the Appendix. 

In order to obtain estimates of personal equation, one series—the Spitalfields 
—was measured on two occasions. The first set of readings (Cl,) were those 
obtained by the writer when he was starting to measure mandibles, and the 
second set (C/,) comprise those readings taken by the writer after nearly three 
months’ experience of the measurements in the Galton Laboratory. Part of the 
data in Table I is based on 50 comparisons of these two sets for each measurement. 
The measurements (H) taken on the Farringdon Street mandibles by Miss 
B. G. E. Hookef were also available for comparison with the measurements (C1), 
16 in number, taken by the present writer on the same series. The remainder of 
the data in Table I is based on comparisons of these latter two sets for each 
measurement. Miss Hooke followed the original definitions given in Biometrika, 
xIv (1923), and, unlike the present writer, she did not work under the direct 
supervision of Dr G. M. Morant. Comparisons in connection with personal equa- 
tion can further be made with the data for repeated measurements published by 
himt. He deals with the personal equation involved in taking two series of 


* Biometrika, xxvii (1936). 
Ibid. xviii (1926). 
t Loe. cit. (1936), Table I. 
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measurements himself on the same material (M, — M,), an interval of two years 
having elapsed between the times when the first and second measurements were 
taken, and that involved when his measurements are compared with those of 
Miss M. Collett on the same series (M, — C). The measurements of all the above 
series were taken on mixed series of male and female mandibles, as it is reasonable 
to assume that the personal equation is likely to be the same for both sexes. 


TABLE I 
Data for estimating the personal equation of mandibular measurements 
Standard deviations 
individual Differences of means (A) f diff 
Char- differences 
acters 
H—-Cl*| Cl,—Cl,t H-Cl Cl, -—-Cl, | Cl, — Cl, 
Wy, +1-4 +1-1| +0-04+-073 (30) | —0-08+-044 (50) | 0-59+-051 | 0-46 + 
JoJo | +1:7| +&-1:8] +40-36+-055 (51) | +40-13+-073 (50) | 0-58+-039 0774.0 
| —1:3 +0-01 + -051 (36) 0-45 + -036 | 
+16} +&—-09| —0-15+-043 (50) | +0-07+-030 (50) | 0-45+-030 | 0-31- 
Cyl +1:5 | —0-21+-045 (32) | —0-13+-034 (50) | 0-38+-032 0-36 + -024 
ml +41 —1:5 | +0-80+-092 (47) | +0-03 + -063 (50) | 0-94 + -065 | 0-66 + -045 
Col | +1:04+-094(51) | +0-08+-041 (50) | 1-00+-067 0-43 + -029 
rb’ —0-9 | +0-29+-054 (47) +0-10+-022 (50) | 0-55+-038 | 0-23+-016 | 
—2:2| +&-—08 | —0-07+-095 (33) | —0-12+-032 (50) | 0-81+-067 | 0-34+-023 
hy — 0-44 + -088 (29) 0-70 + -062 
moh —1-2 +2:3 | —0-27+-075 (23) | —0-02 + -059 (50) | 0-53+-053 | 0-62+-042 
ch | —15 —1:7| —0-26+4-053 (46) | —0-06+ -054 (50) | 0534-037 | 0-57+-038 
rl +17 | +40°75+-077 (47) | +0-02+ -069 (50) | 0-78 +-054 | 0-72 + -049 
MZ | +3°5 —3°-0 | —0°-16+-110 (51) | —0°-60+-081 (50) | 1°-16+-077 | 0°-85+-057 
RZ | -—3°0 —2°-0 | +0°-53+4-131 (39) | —0°-40 + -083 (50) | 1°-21+-092 | 0°-87+-059 
C’Z | +6°-0 — + 1°-20 + +385 (28) | — | 3°-02 + -272 — 
* Differences for the Farringdon Street series. + Differences for the 1e Spitalfie lds series. 


The maximum differences found are given in columns 2 and 3 of Table I of 
the present paper, and in the same columns of Table I in Morant’s paper. These 
maximum differences are of much the same order for the sets of differences 
(M,— C), (M,—M,) and (Cl, — Cl,), but for the set (H -- Cl) they are considerably 
larger for the characters h,, ml, c,1, mp, and C’2. Restricting comparisons to 
the 13 characters whose differences are available for all four sets, it is found that 
for seven of these (H — Cl) has the greatest maximum difference. The effect of 
personal equation on mean values may now be considered. Columns 4 and 5 of 
Table I of the present paper and of Table I of Morant’s paper give the differences 
of the means (A) for the four sets of differences and the probable errors of these 
constants. In the set of differences (M,—C) 9 of the total 16 characters have 
values which differ from zero by less than three times their probable errors. In 
the set (M, —M,) there are 11 out of the 16 showing the same relationship. In 
the set (Cl, — Cl,) there are 8 out of 13 showing the same relationship. In the set 
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(H — Cl) 4 characters only from a total of 16 have values which differ from zero 
by less than three times their probable errors. It is again evident that the differ- 
ences (H — Cl) are clearly distinguished from the other three sets. It will be 
sufficient in comparing mean values of the differences for the different sets to use 
the (M, — C) values and to ignore the values of (M, — ¥,), as it has been shown by 
Morant that the former are on the whole the more reliable. In the comparison 
of Ay, ¢ and Ay, ¢, for the 13 characters possible, 2 of the differences, 
irrespective of signs, exceed three times their probable errors. These are for c,h 
(difference/p.e. difference = 3-5) and MZ (3-8). In the first of these cases the 
difference (M, — C) is the greater, but for M Z the reverse is true. Thus the only 
character for which the standard of accuracy of the present writer is appreciably 
less than that previously demanded—a demand made in consideration of Dr 
Morant’s statistical treatment of the measurements dealt with in his paper—is 
M 2. An explanation of this fact is found in the circumstances under which the 
measurement of this character took place. The Spitalfields series, on which the 
measurements were taken to obtain the set of differences (Cl, —Cl,), contains a 
large percentage of mandibles lacking a condylar process, or having one of these 
processes badly damaged. The positioning of the mandible for the measuring of 
MZ is a matter of approximation in these cases, and such approximation on 
the mandible board is quite likely to give rise to unusually large errors in taking 
the measurements. Now, after subsequent laboratory experience, the present 
writer would not include measurements as doubtful as some of those recorded 
for the Spitalfields series, and therefore it seems reasonable to suggest that 
the inaccuracies consequent upon taking too many doubtful measurements 
are responsible for the unsatisfactory nature of his earlier readings of MZ. 
Comparisons irrespective of signs may now be made between A,, > and 
Ay_cy Out of the 16 comparisons possible 6 differences are greater than 3:5 
times their probable errors. These are for (3-9), gg, (4:3), ¢,¢, (5-3), ml (6-8), 
rl (7-6) and c,,1 (8-0). In all except the third of these cases A, _,, is greater than 
Ay,-c- Three of these measurements are taken on the mandible board, and it 
is clear that Miss Hooke had a conception of the definitions different from that 
of the present writer, all her measurements tending to be greater than his.* 
A detailed comparison of the differences of the means (Cl,—Cl,) and (H — Cl) 


* That this difference in interpretation of the definitions exists in the case of these two workers 
is further illustrated by an examination of the numbers of mandibles on which either took any one 
of the 3 significantly different mandible board measurements, while the other omitted it. Such 
an examination shows that Miss Hooke took 23 measurements of these characters where the 
present writer did not, and that the latter took 11 of the bilateral measurements (rl) on the right 
where Miss Hooke took them on the left. It is clear that all estimates of personal equation are 
likely to be considerably influenced by measurements taken on imperfect specimens, and that 
variabilities of differences will be very much reduced if questionable readings are omitted. In the 
present instance wider discrepancies would have been evident in the set of differences (H —C/), but 
for the fact that throughout the measurement of the whole series the present writer refrained from 
taking measurements in 104 doubtful cases for which Miss Hooke had given readings. 
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need not be made, for in 10 cases out of 13 the latter are the greater. It is thus 
again evident that the differences (H — Cl) are of far greater account than any 
other differences available. 

Comparing the standard deviations of the differences in the same way, it is 
found that for 4 of the 13 characters the differences (M,—C) and (Cl,—Cl,) 
exceed 3-5 times their probable errors. These are for m,h (3-7), 9,9, (3°8), rb’ (5-6) 
and c,,1 (5-9). In all cases except the third the standard deviation (Cl, — Cl.) is in 
excess. A comparison of the standard deviations of the differences (M,—(C) and 
(H — Cl) shows that 7 are significant from a total of 16 characters. These are for 
Cyl (5:1), myp, (5-2), RZ (5-4), (5-5), zz (5-9), c, 1 (7-3) and w, (7-4). In all 
these cases the standard deviation (H — Cl) is the greater. A detailed comparison 
of the standard deviations of the differences for (Cl, — Cl,) and (H — Cl) need not 
be made for, just as in the case of the differences of the means, in 10 comparisons 
out of 13 the (H—Cl) constant is greater than the corresponding (Cl, — Cl,) 
constant. 

It is evident, from the comparisons made between mean differences and 
standard deviations of differences, that the two sets of readings taken on the 
same English mandibles by the present writer indicate errors of personal equation 
which are almost precisely of the same order as those found between the readings 
taken by Miss Collett and G. M. Morant on an Egyptian series. Where the 
most significant differences between the corresponding constants were found 
—viz. in the case of a few of the standard deviations of the differences—the 
readings (Cl, — Cl,) are slightly less consistent than the readings (M,—C). It was 
shown by Morant in his paper that the errors indicated in the latter case (M,— C) 
for the characters used in the present paper are not large enough to invalidate 
inter-racial comparisons, and it seems safe to assume that the same will be true 
for the readings taken by the present writer.* 


* The measurements used in this paper were selected by Morant from a larger number originally 
defined mainly on the grounds that they were found to be the most accurate ones. The tests used 
in making the selection depended on comparison of the differences of means found between two 
sets of readings on the same mandibles with the probable errors of the means of an Egyptian series, 
and onasecond comparison of the standard deviations of the differences with the standard deviations 
of the same Egyptian series. Full details of the method used are given in his paper. Applying the 
same tests to the set of differences (Cl, —Cl,), it is found that 4 out of 13 characters fall short of the 
standard accepted in that paper. These are c,l, m,h, M/ and m,p,. The last did not satisfy the 
tests in the case of Morant’s own data, but its continued use was recommended, since it is but little 
less reliable than the other characters accepted and it is a measurement of particular interest. The 
lack of reliability in the case of the characters c,/ and m,h may perhaps be explained by the failure 
on the part of the present writer to reject the specimens on which it was doubtful whether a close 
enough approximation of the measurement could be obtained, and by his inability to deal effec- 
tively with the condylar anomalies met with in taking the measurement c,/ on the Spitalfields 
mandibles—the first series he measured. In the case of the differences (H — Cl) 10 of the 16 characters 
fail to fulfil the requirements of the tests: and for this comparison the measurement C’/ is 
found to be the least reliable. It is, therefore, of interest to note that for the differences between 
Dr Morant’s and the author’s readings for this angle the tests are satisfied, although they were 
not satisfied for Morant’s and Collett’s original data (see Table IT below). 
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When my readings on the Farringdon Street mandibles are compared with 
those of Miss Hooke on the same series, differences of a markedly higher order 
are found, whether the means or the standard deviations of the differences are 
considered. The same discordance is found when the set of differences (H — Cl) 
is compared with the set (M,—C). It is, therefore, reasonable to suppose that 
Miss Hooke’s measurements on any series, when compared with those of any one 
of the other three measurers on the same series, would not have agreed with theirs 
as closely as theirs would have with one another. It is not possible to assert 
that she measured less accurately than they did, since the relations observed 
above between her measurements and those of the other three measurers are 
probably due to the fact that she had only the original unrevised definitions as 
a guide to her measuring, and that she was not able to work in consultation with 
anyone who had previously applied the technique. 

To close this section on personal equation, a comparison is made between the 
measurement of C’ / (hitherto regarded as the most unsatisfactory of the cha- 
racters included in the technique used in the present paper), taken by Morant and 
by the writer. This measurement was recorded for three series of mandibles by 
both these workers on account of its suspected unreliability. The results are 
set out in Table II below. The mean difference for the combined series differs 
significantly from zero, but both it and the standard deviation of the differences 
are less than the corresponding constants (M,—(C), though not significantly so. 


TABLE II 
Data for estimating the personal equation of the mental angle (C" 2} 
Maximum 
Series differences (3) differsnces 
M—Cl M—Cl M—Cl 
Australian +&—2°-5 +0°-51 + -096 (62) | 1°-12+-068 j 
Punjabi +0°-41 +-138 (28) | | 
Farringdon Street +3°-0 +0°-29 + -114 (43) 1°-11+-081 
Combined | +3°-0 +0°-42+-065 (133) | + -046 | 


On comparison with the constants of the set of differences (H — Cl) in Table I, 
for the character C’ /, it is found that the mean for the combined diiferences 
_(M— Cl) is the smaller, though not significantly so, while the standard deviation 
of the differences for (M — Cl) is also the smaller, and markedly so, the difference 
for (H — Cl) and (M — Cl) in this case being 6-9 times its probable error. From 
ecnsideration of the above results, it seems reasonable to assume that a marked 
improvement has occurred in the accuracy with which the mental angle has been 
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taken. Apparently the difficulties attending the measurement of this important 
character can be overcome by care and practice in measurement, and further 
improvement in accuracy would be expected if better designed instruments were 
substituted for those now in use, which are far from satisfactory. 


4. Methods of sexing the material. In cases where the sexes are not known, it 
is probably impossible to secure absolute accuracy in sexing a series of crania or 
mandibles by anatomical inspection or any other method. Assessment of the sex 
depends on the nature of the whole series dealt with, and not on standardized 
conceptions of characters of the bone that remain constant for every possible 
series. The sexing of any series of bones available is, however, of great importance, 
as the statistical treatment of osteometric material demands that each sex be 
considered separately. Mathematical methods of sexing have therefore been 
devised to supplement anatomical sexing, such methods being based on the 
combined values of certain metrical characters of the bones. It can be assumed 
that the distribution of any of these particular characters for either sex, in the 
case of a homogeneous series, will be approximately normal. The most suitable 
characters for discriminatory purposes are those whose means differ most in 
proportion to the standard deviations of the distributions for the two sexes. The 
characters chosen should also have low intra-racial correlations, and it is an 
advantage if they can be found for a high percentage of bones, so that as few 
specimens as possible will have to be left unsexed. 

Dr E. 8. Martin has discussed several methods of sexing mandibles in a 
recently published paper.* He came to the conclusion that the most effective 
characters for sexing purposes, and those which fulfilled the above conditions 
most adequately, were g,g,, ¢,1, c,h and M Z, and these have been used for the 
purpose in the present paper. He also showed that anatomical sexing is far more 
reliable than had been generally supposed, the percentage agreement between 
mathematical and anatomical sexing being so high that considerable reliance 
can be placed on anatomical sexing alone. The method finally used in sexing two 
of the series of mandibles dealt with in this paper was a combination of a mathe- 
matical method with that of anatomical inspection in the cases where the sex 
was doubtful. Dr Martin demonstrated that very little difference is made to the 
accuracy of mathematical sexing by the inclusion of aged mandibles in a series, 
and hence no account has been taken of the relative ages of the adult mandibles. 
The sexing of the mandibles dealt with in this paper was carried out before 
Dr Martin’s work was published, and hence the more elaborate methods of sexing 
given in his paper were not applied. Moreover, the series dealt with here are so 
short that it seems reasonable to suggest that there would be very little practical 
advantage in applying the more elaborate methods which he found to give only 
slightly better results. It will, however, be interesting to note how far the 


* Loc. cit. 
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admittedly cruder mathematical methods here employed give results in agree- 
ment with anatomical sexing. 

For the purpose of sexing the mandibles, it was assumed that the proportion 
of males to females in the sample to be sexed is the same as that of the cranial 
series to which it belongs. The less crude of the two mathematical methods of 
sexing used—treferred to here as method I—is concerned, theoretically, with the 
sum of the four ratios obtained by dividing the deviations from the means of the 
characters, 9,9,, C,!, c,h and MZ, by the corresponding standard deviations for 
the total series. The mean values for the characters g,g,, ¢,/ and c,h are higher for 
males than females, but the reverse is true for M /, and hence the ratio for this 
last character has to be subtracted from the sum of the ratios of the other three for 
each mandible. There are 95 Spitalfields mandibles, and if the proportions of the 
sexes are to be supposed the same fcr these as for the crania we must take 32 as 
female and 63 as male.* The 32 with the lowest scores will be counted female. 
In actual practice, however, the absolute measurements themselves, and not their 
deviations from the means, were divided by the standard deviations for the total 
series, since the mandibles are arranged in the same order by these two procedures 
and the former entails less calculation. 

The second method tried proceeds as follows: the measurements of each man- 
dible for the characters g,g,, ¢,!, c,h were added together, and in each instance 
the measurement of MZ was subtracted from this total despite the difference of 
units of measurement used, viz. millimetres for the first three and degrees for 
the last character. The 32 mandibles with the lowest totals were classed as female. 
This method will be referred to as method IT. 

The Spitalfields series was sexed anatomically by Dr G. M. Morant, who in 
doing this accepted the proportions of males to females given by the crania, and 
it will be interesting to note the percentage agreements between the methods 
used. These are 87-4 between inspection and method I; 85-3 between inspection 
and method II; and 95-8 between method I and method ITI. The present writer 
also sexed the Spitalfields series anatomically, and there is an agreement between 
this sexing and that obtained from the application of method I in 83-2 per cent. 
of cases. It is surprising to find that such a high percentage agreement between 
anatomical and mathematical sexing is obtained in the case of method I. This is 
a crude method, since the four characters used are assumed to be of equal import- 
ance for sexing purposes and no account is taken of correlations between them. 
It is more surprising still to find a percentage agreement nearly as high when 
method IT is used, since this method is wholly unsatisfactory from a theoretical 
point of view.+ As the above percentage agreements between anatomical and 


* Of the 883 adult Spitalfields crania which were sexed, 590 (66-8 per cent.) were supposed male 
and 293 (33-2 per cent.) female. 

+ If either c,A or c,/ is taken singly as a criterion of sex there is agreement between these 
estimates based on single characters and anatomical sexing of 76-8 per cent. in both cases, and an 
agreement between the two of 66-3 per cent. 
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metrical estimates are of the same order as those obtained from the more elaborate 
methods discussed by Dr Martin, it seems fairly reasonable to conclude that the 
crude mathematical method (I) is satisfactory. The mandibles of the Spitalfields 
series which had been classed oppositely by method I and that of anatomical 
inspection were re-examined anatomically, and on this re-examination the sex 
of the doubtful cases was decided. The crania to which 12 of the mandibles of the 
Spitalfields series had been attached were known, and these had been sexed 
anatomically. The accepted sexes of these 12 mandibles (all those numbered 
under 1000) were now compared with the results obtained from an anatomical 
sexing of the corresponding skulls, and in 9 cases agreement was found. Three 
cases disagreed, viz. No. 509 sexed as male mandible—skull sexed female; 
No. 401 sexed female mandible—skull sexed male; No. 507 sexed female mandible 
—skull sexed male. In each of these cases both methods of sexing the mandible 
gave the same result, which was accepted in spite of the disagreement with the 
sexing of the cranium. 

The original population from which the Farringdon Street series, comprising 
90 mandibles, was drawn is represented by 381 crania sexed in the ratio 213 
female and 168 male. Using the same ratio we thus obtain for the mandibles 50 
female and 40 male. The percentage agreement obtained between the mathe- 
matical method (I) of sexing described and that of anatomical inspection by 
Dr G. M. Morant was 88-9 per cent., and by the writer 86-7. There was an agree- 
ment of 86-7 per cent. between the two anatomical estimates. A re-examination 
of the 10 doubtful cases anatomically, as in the case of the Spitalfields series, 
finally decided their sexes. 

A percentage agreement between anatomical and mathematical methods of 
sexing lies at best somewhere between 85 and 90 per cent., and it is hardly possible 
to improve upon this on account of the presence of border-line cases in the 
samples.* The most satisfactory way to sex a series of mandibles, for which the 
ratio of males to females is assumed known, seems to be that of re-examination 
of those mandibles which have been sexed oppositely by the two methods, and 
finally sexing these cases mainly on anatomical grounds. The mathematical 
method is then merely a subsidiary one used to support anatomical sexing. It 
should be noted that the method of sexing adopted in the case of the two English 
series depends essentially on the assumption that the proportions of the sexes 
are the same for the mandibular as for the corresponding cranial samples. If 
this assumption is incorrect some of the bones will inevitably be sexed incorrectly. 

An examination of the sex ratios of the absolute measurements (i.e. male 
mean/female mean) for the Spitalfields, Farringdon Street and Australian series 
shows that for any particular character there is a close agreement between these 

* The Australian series of mandibles—of which the sexes were known from the cranium, from 
the skeleton, or from more direct evidence—were sexed anatomically by Dr G. M. Morant and the 


writer. The anatomical sexing agreed with the sex assigned in the catalogue in 88 per cent. of cases 
in Dr Morant’s assessment, and in 86 per cent. of cases in my own. 
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ratios for the different series. The small numbers that go to make up most of the 
female series, however, give very little weight to any conclusions drawn from such 
ratios. Nevertheless, it is interesting to note that the generally accepted idea that 
the differences of sex are more marked among primitive than among civilized 
peoples is not borne out by the ratios for the three series concerned, the Australian 
ratio being the greatest only in 4 (and equal to the Farringdon Street in 2) 
out of 13 comparisons. The heights have the largest ratios, but they are not 
markedly higher than those for the remaining measurements. The sex ratios 
tend to be higher for mandibular than for cranial characters in the case of a 
particular series. This has been shown in the case of the Kerma from Egypt, and 
it is also true for the Spitalfields and Farringdon Street series. 


5. Racial differences in variability. It is possible to draw an unambiguous 
conclusion as to relative racial variability with respect to a measurable character 
of one series compared with the corresponding character of another, and one 
approach to the problem of variability would be to make no statement concerning 
relative racial variability, unless it referred merely to variabilities of single 
characters in different series. An alternative method of approach is to assume 
that an estimate of general variability for all characters measured can be obtained. 
When, however, a large number of characters are considered, any statement 
about relative total variability (i.e. variability for all characters considered 
together) must be somewhat arbitrary, since it depends on the particular method 
of comparison employed. When one race shows a consistently higher variability 
than another, in the case of all characters showing a significant difference, we 
can reasonably assume that it is the more variable. When an equal number of 
significantly different characters are found for each race in excess of the other, 
then it seems impossible to assign greater variability to either. 

In Tables LIT and IV are given the standard deviations for all characters and 
the coefficients of variation for the »sctute measurements, respectively. Using 
these coefficients for absolute measure nests and the standard deviations for 
angles and indices, a comparison for characters considered singly may be made 
between the four series of mandibles for which constants are given in the 
tables. The results of such comparisons for the four new series and based 
on the 21 characters are set out in Table V. In column 2 is given, for each series 
in a particular comparison, the number of characters having the greater constants 
of variability. In column 3 are shown the characters which differ significantly 
in the comparison, and, where such a significant difference occurs, the series for 
which the variability of the significant character is the greater. An examination of 
Table V shows that it would be unsafe to infer from the results for the male series 
that one race is definitely more variable than another, since few markedly signi- 
ficant differences are found. An examination of the comparisons for the female 
series does incline one to assume that the Spitalfields series is less variable than 
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TABLE ITI 
Standard deviations for series of mandibles* 

Standard deviations | 
Mal | 
ale Female 
Spitalfields ——" Punjabi | Australian | Spitalfields — Australian | 
Ww, 5-40+-49 | 3-754+-37 | 5-77+-40 | 5974-39 | 3-77+-42 | 5-16+-45 | 6284-59 
IoIo 6-87+-41 | 6584-50 | 5-214+-36 | 7-83+-49 | 5-144+-43 | 5-744+-39 | 5-61+-56 | 
| 5444-48 | 4564-32 | 5-97+-39 | 3-564+-35 | 4:754+-38 | 6-30+-57 | 

zz 2-434+-15 | 2-214-17 | 2-194+-15 | 2-494+-15 | 2:24+-19 | 2:-56+-17 | 2-79+-22 

Cyl 1684-14 | 1-72+-14 | 1-83+-12 | 2:014-12 | 1324-13 | 1-55+-11 2-29 + -20 
ml - 5-41+-40 | 5514-45 | 6274-44 | 3-92+-24 | 4:50+-41 | 5974-43 | | 
Cyl — 3-98+-24 | 3-764-28 | 4:54+-32 | 5-05+-31 | 3-10+-26 | 3-93+-27 | 4-55+-39 
rb’ 2-374--14 | 2-60+-20 | 2-59+-18 | 3:014-19 | 1-98+-17 | 2-61+-18 | 2-79+-23 
Msp; 1-944--13 | 1-22+-12 | 1-63+-13 | 1-52+-09 | 1:064+-12 | 1-44+-16 | 1-42+-12 | 
h, 2-49+-18 | 2-344-32 | 2:154+-24 | 2:974+-20 | 3-07+-32 | 2-73+-26 | 2-36+-23 | 
mah 168+-11 | 2884-32 | 2-724+-23 | 2-42+-15 | 2-294+-24 | 2-60 +-36 | 2-43+-22 | 
c,h 5-05+-30 | 4384-33 | | 5-13+-32 | 3-864+-33 | 4914-33 | 4-93+-40 | 
rl 5-02+°33 | 3-54+-28 | 4-22+-29 | 5:10+-32 | 3-26+-29 | 4-23+-31 | 5-39+-46 | 
M/ — | 7°-03+4-42 | 5°-67+-43 | 5°-65+-39 | 6°-55+4-41 | 5°-84+-49 | 6°-21+-42 | 5°-01+-43 | 

RZ. 7°-74+-53 | 8°-50+-67 | 7°-88+-54 | 7°-10+-44 | 6°-08+-56 | 8°-96+-62 | 5°-69+-50 
6°-67 +-47 | 7°-99+-95 | 8°-36+-85 | 5°-68+-43 | 6°-73+-70 | 6°-13+-56 | 5°-164-55 | 
100 c,h/ml | 4°77+-35 | 5-73+-47 | 6164-43 | 5-47+-35 | 3-944+-36 | 4:79+-35 | 4-73+-42 | 
100c,c,/ml | 6274-51 | 7-67+-73 | 6024-43 | 6-69+-44 | 4:98+-50 | 6034-53 | 7-14+-67 | 
100 9,9,/¢pl) 11-59+-70 | 11-55+-87 | 10-37+-72 | 11-92+-77 | 10-98+-93 | 10-72+-72 | 9-70+-96 | 
106 rb’/rl 5:14+-34 | 4824-38 | 7-014+-49 | 4:99+-31 | 4:06+-37 | 645+-47 | 6-234-53 | 
1009,9o/¢rer| 7254-51 | 8574-76 | 5514-39 | 8514-59 | 6334-62 | 7-534-61 | 8-19+4-83 | 


* The numbers of mandibles on which the constants in this table and in Table IV are based can be 
seen from the table of means (Table IX). 


Character 


| 
Spitalfields | 


6-93 + -49 
7-71+4°55 
6-40 + -44 
7-79 
8-04 + -53 


Farringdon 
Street 


3-19 +0-32 
6°73 +0-51 
5-67 +0-5] 
5-03 + 0-38 
8-69 + 0-72 
5:29+0-43 
4:93 + 0-37 
8-41+0-63 
4:33 + 0-44 
7-57 +1-05 
11-57 + 1-28 
6°75 +0-51 
5-69 + 0-45 


TABLE IV 


Coefficients of variation for series of mandibles 


Coefficients of variation 


Punjabi 


4-96 + -35 
5614-39 
4-84 + -34 
4-99 + -34 
8-97+-61 
6-12+-43 
6-10 +-43 
8-41+-58 
5:70 + -47 
6-44+-72 
10-54 + -88 
7-90 + -56 
6-75 +-47 


Australian 


4-04 + -27 
8-21 + -52 
6:30+-41 
5°25 + -30 
9-39 + -59 
3-63 + -22 
6-07 + -38 
8-77 + -52 
5-00 + 
8-92 + -61 
9-20 + -57 
8-02 + -50 
8-11+-51 


Spitalfields 


3-33 + 0-37 
5:86 + 0-49 
3-92 + 0-38 
5:16+0-44 
6-95 + 0-68 
4-57+0-41 
4-58 + 0-39 
6-87 + 0-58 
4-03 + 0-44 
10-62 + 1-12 
9-79 + 1-03 
6°75 + 0-57 
5-72 + 0-52 


11-02 + 1-54 


Farivingdon 
Street 


4-73 + 0-42 
6-70 + 0-45 
5-18 +0-42 
5-94+0-40 
8-61 +0-62 
6-01 + 0-44 
5-62 + 0-38 
9-22 + 0-62 
5-16 +0-56 
9-19 +0-86 


8-67 + 0-58 
7-91+0-58 


Australian 


5-65 + 0-53 
6-47 + 0-64 
7-17+0-65 
6-09 + 0-49 
11-80 + 1-03 
4-77 +0-42 
5:99 + 0-52 
8-83 + 0-73 
4-86 + 0-42 
7-66 + 0-74 
9-92 + 0-90 
8-82 + 0-72 
9-57 + 0-83 


ng 
| 
| | 
Male Female 
| | | | 
w, 4-50+-41 | | 
IoIo 7 "13+ -43 
Cp Cy 5-57 + -39 
| 
zz 5:39 + | 
cyl 8-19 + -67 
ml 5-28 + -39 
Cpl 5-37 + -32 
rb 7-36 +-44 | 
| 
hy 
h 
rl 
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the Farringdon Street and the Australian, since the former shows the lesser vari- 
ability for most of the characters in these comparisons. 

A comparison (see Table VI) was next made for another group of four series, 
viz. two Egyptian (the Egyptian H and Kerma, the latter having been stated to 
be slightly more variable than another Egyptian series from Qau*), and the Aus- 
tralian and Spitalfields, which are assumed on such evidence as is afforded in 
Table V to be the most and the least variable, respectively, of the series dealt with 


TABLE V 
Comparisons of the variabilities of two English, an Indian, 
and an Australian series of mandibles 


Series | Nos. of greater | Significant differences 
constants | 
Male: | 
Spitalfields (SF.) : Farringdon Street (FA.) | SF. 11>, Msp, (3-9) SF.>, 
FA. 10> (3-8) FA.> 
Spitalfields : Punjabi (Pu.) SF. 10>, mah (4-2) Pu.> 
| Pu. ll> 
Spitalfields : Australian (Aus.) SF. 8>, ml (3-7) SF.>, 
Aus. 13> mh (3-8) Aus. > 
Farringdon Street : Punjabi FA. 10>, w, (3-7) Pu.>, 
Pa. 10>, t= 100 rb’/rl (3-5) Pu.>, 
100 9,9,/¢,¢, (3°6) FA.> 
Farringdon Street: Australian FA. 7>, rl (3-6) Aus. > 
Aus. 14> 
Punjabi : Australian Pu. 9>, JoJo (4-0) Aus.>, 
Aus. 12> ml (5-2) Pu.>, 
100 990/¢r¢, (3-9) Aus. > 
Female: 
Spitalfields : Farringdon Street SF. 3>, 100 rb’/rl (3-9) FA.> 
FA. 18> 
Spitalfields : Australian | SF.5>, w, (3-6) Aus.>, 
| Aus. 16> c,c, (4-3) Aus.>, 
rl (3-9) Aus. >, 
c,l (3-9) Aus. > 
Farringdon Street: Australian FA. 12>, | R/ (41) FA.> 
Aus. 9> 


+ The constants compared are coefficients of variation oi the absolute measurements (Table 
IV) and standard deviations of the indices and angles (Table TT). 


there. There seems to be no marked difference between the Kerma and the 
Australian series, though the Kerma, it is interesting to note, appears to be 
considerably more variable than the other Egyptian series (Egyptian /), as does 
also the Australian, especially in the comparison for the female series. A com- 
parison of the two series which appear to show the least variability (viz. the 


* G. M. Morant, loc. cit. p. 102. 
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TABLE VI 


Comparisons of the variabilities of two Egyptian, an Australian, 


and an English series of mandibles 


Nos. of greater constants 


Male: 


Kerma (K.): Australian (Aus.) 
Kerma : Egyptian (Eg.) 
Kerma : Spitalfields (SF.) 
Egyptian #: Australian 


Egyptian Spitalfields 
Spitalfields : Australian 


Nemale: 
Kerma : Australian 
Kerma : Egyptian £ 


Egyptian Z: Australian 
Egyptian #: Spitalfields 


Spitalfields : Australian 


Significant differences 


K. 10>, Aus. 11> 

K. 16>, Eg. 5> 

K. 15>, SF. 6> 

Kg. 6>, Aus. 14>, l= 


Eg. 8>, SF. 12>, l= 
SF. 8>, Aus. 13> 


K. 11>, Aus. 10> 
K. 17>, Eg. 3>, l= 


Eg. 7>, Aus. 14> 
Eg. 14>, SF. 7> 
SF. 5>, Aus. 16> 


rl (4-2) K.> 


cyl (4-6) Aus. >, 
ml (4-1) Eg. > 

mh (4-6) Eg. > 

ml (3-6) SF.>, 
m,h (3-8) Aus. > 


M Z (3-7) K.> 
100 c,h/ml (3-6) K.>, 


c,h (5-0) K.> 


c,h (3-7) Aus. >, 
(3-8) Aus. > 

w, (4:3) Eg.>, 
C,C, (3-7) Eg. > 


w, (3-6) Aus.>, 
c,¢, (4:3) Aus. >, 
c,l (3-9) Aus. >, 
rl (3-9) Aus. > 


Spitalfields and the Egyptian £) indicates that for the female series the Egyptian 
E may be considered the more variable. 

Although there appear to be racial differences in variability, the material 
available for the mandible accords with the far more extensive material relating 
to cranial and living series, in showing that the absolute differences between the 
variabilities of different races are exceedingly small. It is interesting to observe 
from Tables V and VI that the Australian and Egyptian tend to be more variable 
than the two English series. This is an unexpected result, but it may be cue to 
some peculiar selection of the mandibles preserved. 


6. Special topics: correlations, asymmetry and records relating to teeth. The 
coefficients of correlation between the various mandibular measurements throw 
some light on the interdependence during growth of various parts of the mandible.* 
The first structural peculiarity we note from an examination of the correlations 
in Morant’s paper is the fact that the dental arcade between the mid-points of the 
alveolar margins of the first premolar and second molar (m,p,) seems to be 
uncorrelated with any other chords. Apparently, growth of this particular area 


* This section is complementary to that on correlation in G. M. Morant’s paper, “‘A Biometric 
Study of the Human Mandible”’, loc. cit. pp. 103-8. 
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ceases at an early age. Thus the coefficients of correlation between m,p, and all 
the antero-posterior chords were previously found to be insignificant: in the 
Australian male series, however, a significant value is found for m,p, and c,1 
(r= + 328+ -081). This would certainly have been expected @ priori as c,l 
“covers” 

Since the correlation coefficients for like measurements (i.e. breadths with 
breadths, heights with heights, etc.) are most interesting when they are in- 
significant, and since the converse holds good for unlike measurements, it is of 
interest to note (for the Qau Egyptian series) that though h, is, as expected, 
correlated with m,h, yet it is not so with 71 or with any other rameal height. The 
measurement (h,) is, however, fairly highly correlated with ml, and this seems to 
point to the fact that an antero-posterior growth of the mandible necessitates a 
corresponding growth in height of the corpus. It is surprising to find a low corre- 
lation coefficient between c,,/ and ml, two antero-posterior chords, especially as 
the former is “‘covered”’ by the latter. An unexpectedly high correlation is that 
between RZ and 100c,c,/ml, though what growth factor influences these two 
particular measurements is not evident. Another result which reveals an un- 
expected feature of the architecture of the mandible is the high negative correla- 
tion found between the breadth of the ramus (rb’) and the mandibular angle (M Z). 
The broad solid ramus is apparently found on the upright looking mandible, — 
while the slender ramus accompanies the sloping type of mandible. 

The few correlation coefficients computed for the Australian male series by 
the writer, and set out below, lead to the same conclusions as those derived from 
the Qau series. A comparison between corresponding values for the two series 


reveals no single significant difference. The following coefficients are found for 
the Australian bones: 


h, and MZ+-090 +-102 (43); andml +-5164+-075(44), 
h, and mh +-539 + -071 (45); h, and myp,+-015 + -097 (48), 
h, andrl +-291+4-094 (43); rb’ and MZ —-509 + -065 (59), 
C’ Zand M Z —-299 + -100 (38); and — +064 + -088 (58), 
cyl andml +-390 + -076 (57); cyl and mp, +-328 + -081 (55), 


RZ and 100¢,c,/ml + +548 + -066 (51). 


It appears that all intra-racial correlations between absolute measurements 
of the mandible are positive, whether significant or not, or negative and in- 
significant. In other words, a large mandible tends to be large in all respects, and 
a small one to be small in ail respects, a fact for which the normal growth of the 
mandible as a whole is evidently responsible. The measurement m,p, alone shows 
no tendency to conform to the general growth trend. 

The asymmetry of the mandible has sometimes been taken as a well-established 
fact, and anatomists as famous as Le Double have even made categorical state- 


ments to the effect that the right side of the mandible is on the average always 
Biometrika xx1x 
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greater than the left. An examination of the bilateral measurements—viz. m,Pj, 
c,l and rb’*—which were taken on the mandibles of the Qau and Kerma male and 
female series has shown that there is fairly clear evidence of a slight asymmetry 
in type for these Egyptian series, and that the right side of the mandible is not 
consistently greater than the left. In fact, the length of part of the dental arcade 
(m./,) as well as the breadth of the ramus (rb’) were larger on the left side than on 
the right, the length of the condyle (c,/) alone supporting Le Double’s hypothesis. 
It is strange to find that the broader ramus supports the smaller condyle (at 
least if the length of the condyle is any criterion of its size). From Table VII it 
will be seen that the statistical evidence warrants no assumption of asymmetry 
in type in the case of the Australian mandible. The means actually found are all 
slightly greater on the right than on the left, but the bilateral differences are 
quite insignificant. 


TABLE VII 
Constants of bilateral differences for the Australian male series 
Means (L—R) Standard deviations 
MP; Cyl rb’ cyl rb’ 
| 
—-079 + -075 —-028 + -096 — -006 + -098 0-84 + -053 1-01 + -068 1-16 + -069 
(57) (50) (64) 


Comparing the male Australian differences with those previously given for 
the two Egyptian series,} no clearly significant differences are found, so there is 
no evidence to show that there are racial differences in asymmetry. 

Table VIII shows that a high percentage of the Australian mandibles, both 
male and female, had never lost a single tooth during life. This is in marked 
contrast to the civilized English series represented by the Farringdon Street 
(composed of seventeenth-century Londoners) and Spitalfields series (probably 
a population living in England during the Romano-British period). The Indian 
series has about 50 per cent. of its mandibles—25 out of 49—with a complete set 
of teeth, and, while being far below the corresponding percentage for the Austra- 
lian series (approximately 80 per cent.), such a figure is much higher than that for 
the English series, which for males and females combined gives a percentage of 
22-5.{ There were several cases in the English and Indian series of arthritic 
condyles, though not a single instance of this was found in the Australian series 
which contains a case of syphilis (See Plate V A) according to the catalogue of the 
Royal College of Surgeons. The small numbers found in category 3 of Table VIII 

* These measurements are usually taken, when possible, on the left side only. 

+ G. M. Morant, loc. cit. Table IT. 

} In the case of the long Egyptian E series examined by Dr Martin the percentages having all 
teeth, including third molars, present at death are 40-7 for males and 44-3 for females. 
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do not in any way affect the conclusions stated above and based on the figures 
found in category 2. It is to be noted that the third category of the table includes 
some cases for which the third molars had probably nevererupted; for an examina- 
tion of the dental arcade often fails to determine whether a molar was lost before 
death, or whether it had never erupted at all. The following cases of overcrowding 
were noted: Spitalfields male 3, female 2; Australian male 4, female 0; Farringdon 
Street male 1, female 1; Punjabi male 0. It appears that the opinion expressed 
by several anthropologists to the effect that overcrowding of the teeth is on the 
increase among modern civilized peoples is not supported by these figures, which 


TABLE VIII 


Comparisons of the dentiiions of series of mandibles 


Male Female 
Spital- | anstra- Spital- | austra- 
felds | tian” |Puniabi| “fags | | “tian 
Street Street 
1. No. for which dental arcade 27 69* 72T 49t 22 83§ 36 
is complete 
2. All teeth including 3rd 10 18 58 25 5 12 26 
molars present at death 
3. All teeth except one or both 5 8 3 l 4 8 2 
3rd molars present at death 
4. No. having lost one or more 5 24 4 16 6 37 7 
teeth in front of molars 
before death 


* 40 measured, 29 not measured; these latter were sexed anatomically. 

+ Including Northern Territory mandibles and 1 with socket for single pair of incisors. 
t Including 2 with sockets for thtee incisors only. 

§ 50 measured, 33 not measured; these latter were sexed anatomically. 


refer only to the more marked cases.|| Furthermore, to assert on the alleged 
evidence of the increased incidence of overcrowding of the teeth that there is a 
tendency for modern man, civilized and uncivilized, to have a dental arcade 
smaller in size than that of his primitive ancestors would be unjustifiable, since 
overcrowding is dependent on the ratio of size of teeth to size of jaw, and not on 
the absolute size of the dental arcade. 


7. Comparisons by the method of the coefficient of racial likeness. One of the 
primary needs of physical anthropologists in dealing with problems of human 
evolution is a means of classifying the races of man. To meet this need Professor 
Karl Pearson devised the method of the coefficient of racial likeness, which is a 


|| Several cases of impaction of the third molar were also noted. 
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generalized statistical criterion derived from pairs of racial series and based on the 
comparison of a number of their mean measurements for different characters. 
Large numbers of these coefficients computed for different groups of cranial series 
have been published, chiefly in Biometrika, and the method has also been applied 
to data for series of living people in s*veral papers. In his study of the mandible 
(loc. cit. 1936) Morant gives values fox all possible pairs of 12 male and 5 female 
series, and he says (p. 116) that: “these criteria lead to a reasonable arrangement 
of the types which encourages the hope that a similar comparison of more ex- 
tended material would furnish valuable aid in estimating racial relationships. 
At the same time the fact that samples of the sizes at present available are not 
differentiated cannot be accepted as a test of racial identity.’”’ The coefficients 
given by Martin (loc. cit. Table V) between his series of 26th-30th Dynasty man- 
dibles from Gizeh and the earlier material do not conflict with these conclusions. 
Weare now able to add comparisons with the 4 male and 3 female series described 
in the present paper, making a total of 17 male and 9 female, and it will be seen 
that these make it necessary to reconsider the position and, indeed, to question 
whether it is possible to obtain any rational classification of races from measure- 
ments of mandibles. The mean measurements of the new series are given in 
Table IX. 

If M, is the mean and o, the standard deviation of the sth character, these 
being based on n, individuals, in the case of the first series, and if M,, o, and n, 
are the corresponding constants for the second series, then what is now called 
the ‘‘crude”’ coefficient of racial likeness is defined to be: 

Mey —l+ ors |, 

m m 

My Ny 

where m characters are compared. The standard deviations for the shorter series 
are likely to be particularly unreliable, and hence it is assumed that they are equal 
to those for the longest homogeneous series available. It has been shown above 
that, though these constants for different series show a few significant differences, 
yet they tend to be of closely similar orders in the case of a particular character. 
Supposing that o, = o,, the coefficient becomes: 


,)2 
15 _ 1 /2. 


Gs Ns Ny 
which is written, for convenience, as: 


1467489, 
m m 


Following Morant, the standard deviations of the ancient Egyptian series from 
Qau were used.* From the crude coefficient we can obtain a generalized measure 


* The Qau is the longest of the series used in his paper. That of the 26th-30th Dynasty mandibles 
from a cemetery at Gizeh, described by Martin, is considerably longer, but the standard deviations 
for it were not available when the computation for the present paper was carried out. 
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of the probability that the two samples compared were drawn from populations 
with identical means. The formula is theoretically more correct if the m characters 
used are uncorrelated with one another intra-racially. Ten were selected by 
Morant, principally because the correlations between them are low in the case of 
the male Qau series: of the total 45 r’s 40 are less than -3, the highest value is 
+-597 and no single character has more than two coefficients greater than -3. 
The same 10 characters were used in calculating all the coefficients of racial 
likeness for mandibular measurements, including those given in the present 
paper. It may be noted that this number is considerably smaller than the 31 used, 
when possible, in the comparisons of cranial series by the same method. 
Having the same nature as a measure of probability, the crude coefficient of 
acial likeness depends on the sizes of the samples compared. But the anthro- 
pologist is more interested in a measure of the absolute divergence of the types, 
and this is supposed to be obtained from the crude coefficients by adjusting them 
to values they might be expected to have if the samples were made up, not 
by the numbers actually available, but by 100 individuals each. If 7”, and 7, are 
the mean numbers of individuals available for the m characters in the case of 
the first and second series in the comparison, respectively, then the reduced 
coefficient is defined to be: 
50 x 


In a general way the classifications of groups of cranial series based on reduced 
coefficients of racial likeness that have hitherto been given accord with evidence 
of other kinds. The failure of the same method to give as reasonable results when 
applied to series of mandibles may possibly be due in this case to the inadequacy 
of one or other of the assuinptions made in calculating the reduced coefficients. 
This possibility will be examined after presenting the results. 

The reduced coefficients for the four new series, and between them and all the 
earlier ones, are given in Table X, and the values for all other pairs of the 17 male 
and 9 female series will be found in the papers by Morant and Martin. It has been 
shown repeatedly for cranial data that in attempting to derive a classification of 
the racial types from such material the most reasonable and suggestive results 
are always obtained if only the lowest orders of reduced coefficients are con- 
sidered, while no account is taken of any greater than an arbitrarily defined limit. 
It appears to be an advantage to choose this limit as low as any which can be 
conveniently used for a particular group of series. In the case of the mandibular 
data the larger values of the reduced coefficients fail entirely to provide any 
arrangement of the types which could be supposed to indicate their inter-relation- 
ships and, accordingly, only the lower values will be considered now. In dis- 
cussing the material available to him, Morant ignored all greater than 11, and for 
the material available now it was found that a limit of 10 could be used more 
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TABLE X 
Reduced coefficients of racial likeness for mandibular series 7 
Series Sex n* Spitalfields — Punjabi Australian 
Anglo-Saxon 3 42-4 | 6-54+0-67 | 17-88+0-83 | 20-14+0-70 | 25-40+0-61 
41-2 | 21-54+0-95 | 33-77+0-74 23-16+0-88 
Dunstable 3 37-3 |. 6-57+40-72 11-82+6-88 | 15-8140-75 | 26-26+0-66 
Spitalfields 3 48-5 — 2-82+0-79 | 5-46+0-66 | 38-10+0-56 ae 
2 | 260 16-75 40-95 51-07 + 1-09 
Farringdon Street 3 | 318| 2-82+0-79 2-15 40-827! 49-56 40-73 
9 | 40-3 | 16-75+0-95 — 41-99-+0-89 
Badari Egyptian 3 | 335 | 40-664+0-76 | 43-27+0-92 | 31-04+0-80 | 56-8740-70 
(Predynastic) 2 18-9 | 53-29+ 1-38 31:844+1-17 42-39+1-31 © 
Egy ptian 3 66-4 1 1- 81. +: 0: 54 | 11- 09 +0-70 | 4:50+0-57 | 32-43+0-48 
(4th- 11th Dynasty) - | 55-7 | 31-69+0-85 | 17-18+0-64 | — 19-27 +0-79 
“Sedment Egy ptian 32-4 “16: 51 + 0- 78 21- 43 +0: 94 10- 15 81 31: 80+0- 72 
(9th Dynasty) 2 | 91-2 | 29-0641-29 | 17-92 41-08 | t. 40-91 + 1-23 
Kerma Egyptian 3 55-7 “65+ "58 | 29- 34 + 0- 74 28 +0-62 | 19-75+0-52 
(12th-13th Dynasty 44-7 | 50-39+0-92 | 29- 36 71 29-78 +0-85 
Gizeh Egyptian 2 | 211-7 5-02 +0: 38 | 9514055 | 48+0- 42 32: 60+0-33 
(26th-30th Dynasty) 131-8 | 10-:16+0-69 | 6-08+0-48 39-67 + 0-62 
Tamil 3 | 330] 82940-77| 4944093 | 7-56+0-80 | 34714+0-71 
Punjabi 3 43-4 | 5-46+0-66 | 2-15+0-827 39-38 + 0-60 
Nepalese 3 18-9 | 13-2641-11 | 10-67+1-27 10-91 + +1-15 | 26-774 1-05 ‘ 
Tibetan A 3 24-9 | 12-41+0-92 | & 72+1-08 | 17-01+0-96 | 30-25+0-86 d 
Tibetan B 3 11-9 26- 1741-58 | 36-08+ 1-74 | 32-494 1-61 9-71+1-52 a 
Hylam Chinese 3 38-8 | 5-:24+0-70 | 14-33+0-86 | 17-89+0-74 | 26-39+0-64 ¥ 
Fukien Chinese 37°5 18. -22+0-71 30-09 +0-°88 | 27-15+0-75 | 22-16+0-66 
Austealian | 59-0 10 “56 49-56 +0- 73 39-38 +0-60 — 
2 | 29-3 | 51-074 1-09 | 41-99+ 0-89 — 
L aj 
* The v’s are the mean numbers of mandibles available for the 10 characters (w,, zz, c,l, ml, ra 
Mp;, 7b’, h,, rl, RZ and 100 g,g,/c,!) used in computing the coefficients. 
+ The crude coefficient corresponding to this is 0-79 + -30. 
conveniently. The arrangement suggested by the reduced coefficients can be 
appreciated most easily from Fig. 1, which shows all the connexions between the 
male series given by values less than 10. The Badari Predynastic Egyptian is the 
only series which has no reduced coefficient less than the arbitrary limit chosen. eee 
The 17 series can be divided into three groups—an English, an ancient : 
Egyptian and an Asiatic—and the Australian series. Considering these in turn, > 
an unexpected relation is at once found in the insignificant coefficient between I 
the Anglo-Saxon and Dunstable series, indicating that no distinction can be made 
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between their mandibular types although the cranial types are clearly differ- 
entiated (reduced coefficient = 13-66 + -45*). The mandibular value (— 0-18 + -76) 
is here less than the cranial and of an entirely different order. Equally unexpected 
is the linking of both the Anglo-Saxon and the Farringdon Street series to the 
Spitalfields, and also the lack of any connexion between the Anglo-Saxon and 
Farringdon Street series. The cranial evidence shows the inverse relation to this, 
viz. a close connexion between the Anglo-Saxons and seventeenth-century Lon- 
doners and a clear distinction between these two and the Spitalfields population, | 
which is of uncertain date. Between the Anglo-Saxon and Farringdon Street 

series the cranial reduced coefficient is 8-79 + -32 and the mandibular 17-88 + -83; | 
here the mandibular value is greater than the cranial and of a different order. 


ENGLISH SERIES 


ANGLO -SAXON—— DUNSTABLE BADARI 

SPITALFIELDS- FARRINGDON ST. | 
SOAU-- 


FUKIEN CHINESE REDUCED COEFFICIENTS 


TIBETAN ‘B’ 


INSIGNIFICANT 


SIGNIFICANT AND LESS THAN 5 
AUSTRALIAN 5-10 


Fig. 1. The lowest reduced coefficients of racial likeness for 17 series of male mandibles. 


Turning to the ancient Egyptian group, the fact that the Badari shows no 
connexion with any other series is not surprising, as it is the only one availiable 
of predynastic date and it is assigned to one of the earliest known predynastic 
periods. The insignificant coefficient between the Sedment and Kerma series 
(1-90 +-74) is unexpected, as the reduced coefficient for the cranial series is 
16-41+-31. For this Egyptian group, however, there are no results as unsatis- 
factory as those noted for the English group. The Asiatic mandibular series again 
show entirely unexpected resemblances and divergences. The Tamil, Nepalese 


* The reduced coefficients of racial likeness for cranial series given here are all taken from papers 
in Biometrika published in 1926 or later. 
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and Tibetan A series have coefficients with one another which all differ insigni- 
ficantly from zero, while the three corresponding cranial reduced values range from 
12-6 to 40-9. For the Hylam and Fukien Chinese the reduced mandibular coeffi- 
cient is 7-66 +-79 which indicates distinct differentiation. The cranial types of 
these two series are very different, but it is suspected that this is due to the fact 
that the Hylam skulls were artificially deformed, though their facial and palatal 
measurements, which are not distinguished from the Fukien, do not appear to 
have been affected. The mandibles do distinguish the types, and this cannot be 
attributed to deformation. 

These intra-group connexions do not encourage the hope that it will be pos- 
sible to obtain any suggestive classification of the types from the reduced 
coefficients obtained from the mandibular measurements, and the comparison 
of series belonging to different groups makes this obvious. The most surprising 
connexion of the latter kind is the insignificant coefficient for the Farringdon 
Street and Punjabi series. But the Farringdon Street also has a lower value with 
the Tamil than with either the Anglo-Saxon or Dunstable series, and the Punjabi 
has a lower value with the Qau Egyptian than with any Asiatic series. It appears 
to be quite impossible to accept the coefficients as measures of racial relationship: 
they sometimes show close resemblance in type where no close racial affinity can 
be imagined, and they sometimes indicate clear distinction in type where close 
racial affinity must have existed. It may be noted that all the coefficients which 
make it impossible to obtain as suggestive a classification of the data as that 
obtained from the series previously dealt with are with one or other of three of 
the four new series. If the Spitalfields, Farringdon Street and Punjabi are 
omitted from Fig. 1 no connexions of She order considered are found between the 
three groups of series. The new material has apparently demonstrated the defect 
of the method. 

It is clear that reduced coefficients of racial likeness for the mandible tend, in 
general, to be markedly lower than the values corresponding to them for the 
cranium, and only one case for which the reverse is true has been noted above, 
viz. that of the Anglo-Saxon and Farringdon Street series. This suggests that the 
unsatisfactory nature of the results shown in Fig. 1 may be due to the fact that 
some of the series used are too small for the purpose. The limiting size of cranial 
samples which yield suggestive and consistent results when compared in the 
same way has been determined empirically. In this case 50 is a safe limit to take, 
but samples composed of 30—50 individuals generally yield reliable results. The 
sizes of the series of mandibles can be judged from the 7’s in Table X, these being 
the average numbers of bones on which the means of the 10 characters used in 
computing the coefficients are based. Two of the male series have ”’s under 20, 
and no reliance whatever can be placed on results obtained from cranial samples 
no larger than these. If all series with 7%’s less than 50 are ignored, we are only 
left with the Qau, Kerma and Gizeh (#) Egyptian and the Australian. The first 
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three are connected with one another and the last is widely removed from all 
of them by the coefficients, so there is nothing unexpected in these results, as far 
as they go. If the limit is lowered to 40, the Anglo-Saxon, Spitalfields and Punjabi 
series will also be included. Unexpected connexions are then found between the 
Punjabi series, on the one hand, and the Spitalfields, Gizeh Egyptian and Qau 
Egyptian, on the other, but of these three coefficients only one is less than 5 
(Punjabi and Qau reduced = 4-50 + -57), and it has been shown that some ancient 
Egyptian and modern Indian cranial types are remarkably similar.* The results, 
which it is impossible to accept if the coefficients are considered as measures of 
racial relationship, are only evident when the shorter series are brought into the 
picture. The six insignificant coefficients, for example, are quite unexpected and 
unacceptable, but for every one of these one or both of the series compared has 
an 7% less than 40. The fact that several marked differences may be found between 
corresponding male and female coefficients in Table X, also suggests that several 
of the series are too short to give consistent results. The limiting size of sample 
required can only be determined empirically at present, as any theoretical esti- 
mate of it would require a knowledge of inter-racial variabilities which could only 
be found from far more extensive material than that available. It is quite possible 
that suggestive results would be given by series of mandibles made up by 50 or 
more individuals, or it may be necessary to adopt a still higher limiting size; and 
it may also be necessary to reject all reduced coefficients greater than 5, say, in 
interpreting such data. We cannot say that the method applied to measurements 
of series of mandibles is incapable of yielding results of value to the anthropologist, 
since it may be that the lack of suggestiveness of the arrangement shown in 
Fig. 1 is merely due to the fact that certain essential conditions were not observed 
in preparing that diagram. Data for additional series of a sufficient length will 
be required, either to justify the use of the coefficient of racial likeness in this 
case, or to demonstrate that it cannot be used profitably. All we can assert is that 
short series—composed of fewer than 40 mandibles, say—will not provide what is 
wanted. 

Certain devices are used in calculating the reduced coefficients of racial 
likeness, and it may be suggested that these are partly responsible for the dis- 
cordant results, and that the use of a theoretically more correct formula would 
modify them appreciably. The effect of the use of a single set of o’s instead of the 
values for each series used may be examined first. Martin has given reduced 
coefficients between the Egyptian # and a number of other series computed by 
using the Qau o’s, in one case, and those of the Egyptian £ series itself, in the 
other.t Corresponding pairs of the lower coefficients are all in close agreement, 
though a few significant differences were found for the higher values which are 


* See “A Study of the Badarian Crania recently excavated by the British School of Archaeo- 
logy in Egypt’’, by Brenda N. Stoessiger, Biometrika, x1x (1927), pp. 110-50. 
+ Loc. cit. Table V. 
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neglected, however, in attempts to interpret the data. Three of the new male 
coefficients were calculated in the two ways with the following results, the first 
value of the reduced constant being that found by using the Qau o’s and the 
second that found by using the Egyptian E o’s: Farringdon Street and Punjabi 
2-15 +-82, 2-56+-82; Farringdon Street and Tamil 4:94+-93, 5-21+-93; Far- 
ringdon Street and Anglo-Saxon 17-88 +-83, 22-16+-83. These results accord 
with Martin’s, the coefficients calculated in the two ways being in close agreement 
in the case of the two lower pairs. It is unlikely that any of the unexpected rela- 
tionships found between the series can be attributed to the use of a constant set 
of o’s in place of the sets for each of the series in a particular comparison. 

It is unlikely, too, that the results obtained from the reduced coefficients of 
racial likeness differ appreciably from those which would be’given by a theoreti-- 
ally more correct formula which takes into account the intra-racial correlations 
between the different measurements used. The 10 characters were chosen 
because the correlations between them are nearly all of a low order, and there is 
a far closer approach to the ideal condition here than in the case of the characters 
used in computing the cranial coefficients. Also, the mandibular coefficients which 
differ insignificantly from zero show a difference of means for nearly every 
character considered separately which would usually be considered insignificant,* 
and under these circumstances it cannot matter much whether the correlations 
between the characters are taken into account or not. These insignificant coeffi- 
cients are largely responsible for our inability to accept the criterion as a measure 
of racial relationship. 

Consideration of the same group makes it evident that the method of “‘re- 
ducing”’ the crude coefficient cannot be responsible for all the unexpected results 
obtained. In the case of the six crude coefficients which differ insignificantly from 
zero there is, in fact, no need to reduce them, and it is clear (from cranial evidence) 
that the device used achieves the end in view sufficiently well in other cases. As 
far as can be seen now, therefore, no one of the assumptions made, or devices 
used, in applying the method of the coefficient of racial likeness can be considered 
responsible for the failure of the method to give results of value. This failure may 
be due to the fact that it has been applied to samples which are too small. Another 
possibility is that the group of measurements used is unsuitable for the purpose 
in view, and this is discussed in the following section. 

8. A comparison of single characters. The relative values of different characters 
for purposes of racial classification can be estimated from the «’s found in com- 
puting the coefficients of raciai likeness. An « is approximately the square of a 
quantity which is the difference of two means divided by its standard error, and 
the difference—if considered by itself—may be supposed clearly significant if 
the « is greater than 10. Comparisons have been made between 17 male series for 
the same 10 characters, so there is a total of 1360 «’s for these. Of this total 374 


* See p. 109 below. 
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(27-5 per cent.) are greater than 10, and for 12 of the 17 series Morant found a 
percentage of 27-3.* These percentages are in remarkably close agreement, in 
spite of the fact that they depend to a certain extent on the sizes of the series 
compared: on the average, comparisons of longer series will be expected to show 
more significant «’s than comparisons of shorter series. But it is clear that for the 
series available the characters used are capable of making clear distinctions. 
For each of the 10 characters, in the comparison of the 17 series, there is a total 
of 136 «’s. The percentages of «’s greater than 10 are: RZ 12-5, rl 20-6, h, 24-3, 
Cyl 25-0, mop, 25-7, zz 28-7, ml 28-7, rb’ 30-9, 100 g,g,/c,1 36-8, w, 41-9. Some 
characters evidently distinguish the types far more effectively than others, and 
it must be remembered that two were omitted from the list used in computing the 
coefficients because they appeared to be practically constant for the series con- 
sidered by Morant. For the mandibular angle (M Z) he only found three signi- 
ficant differences among 66 comparisons of mean values. This was the more 
surprising since anthropologists have often supposed that this character is of 
peculiar importance. It shows great intra-racial variability—the standard 
deviations for it being of the order 6°—and the means for 16 male series all lie 
between 120°-0 and 125°-3. It is true that the Australian mean of 117°-0 for 59 
male mandibles is clearly divergent. The index expressing the breadth at the 
angles as a percentage of the breadth at the tips of the coronoid processes 
(100 9,9,/¢,¢,) was also omitted because it only showed two significant differences 
in 66 comparisons. The intra-racial standard deviations for this character are 
of the order 7-0 and the range of the means for 17 male series is 95-0—102-9: the 
value of 100-4 for the Australian bones is not peculiar. 

At the other extreme we find the mental angle (C’ 2) for which inter-racial 
variability is evidently much greater in proportion to intra-racial variability 
than in the case of the two preceding characters. This showed 35 significant 
differences among 66 comparisons of means, but it was not included among the 
characters used in computing coefficients of racial likeness because it was feared 
that it is a less reliable measurement than most of the others. It has been shown 
above (p. 89) that the readings of two observers sometimes show a very satis- 
factory agreement, and it is unlikely that personal equation is a disturbing factor 
in the case of comparisons between most of the means available for this character. 
A greater angle denotes a lesser projection of the chin. For 4 English male series 
the means range from 61°-8—70°-5, for 5 Egyptian from 70°-2-75°-5, for 7 Asiatic 
from 62°-9-77°-1 and the Australian mean for 40 bones is 78°-0. Several of these 
means are based on small numbers of specimens, and some have standard errors 
of the order 2°, but it is clear that the character often makes very clear distinc- 
tions between racial types. The Australian mean is extreme, but less removed 
from some of the others than would have been anticipated. 

We may now ask whether the failure of the method of the coefficient of racial 


* Loc. cit. p. 114. 
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likeness to give suggestive results when applied to series of mandibles is due to 
the choice of characters used in computing it. One of the most disconcerting results 
is the occurrence of insignificant coefficients in cases where distinct differentiation 
would have been expected, and where it is shown by the coefficients for the corre- 
sponding cranial series. Six insignificant values for male series of mandibles have 
been found (see Fig. 1), involving nine series. Standard deviations have only been 
given for four of these—the Dunstable and Farringdon Street English, the 
Punjabi and the Kerma Egyptian—as the others were considered too short for 
the purpose, and for the remaining five the Qau Egyptian constants may be 
applied, as in computing the coefficients. Comparisons of the means are summar- 
ized in Table XI, “none” signifying that there are no differences greater than 
three times their probable errors and the numbers in brackets being the ratios of 
the differences to their probable errors in cases where these are greater than 3. 


TABLE XI 


A comparison of the significant differences between means for two groups of 
characters, in cases where the coefficients of racial likeness indicate an 
insignificant difference* 


10 c.R.L. characters | 11 other characters 
| 
Anglo-Saxon and Dunstable None None 
Farringdon Street and Punjabi h, (4-4), 9oGo (5-7), C’ Z (5-2) 
100 (3-6) 
Sedment and Kerma Egyptian ml (4-7) | 100 c,h/ml (3-4), 
| 100 c¢,c,/ml (3-2), C’ Z (3-9) 

| ‘Tamil and Nepalese h, (3-1) C’ / (46) 
| Tamil and Tibetan A None C’Z (55) 

Nepalese and Tibetan A None | C’ Z (3-3) 


* See text for explanation. 


In these 6 cases the 11 characters which are not used in computing the 
coefficients do tend to show a larger number of significant differences, and clearer 
differentiation in the case of some characters, than do the 10 characters used. 
But this difference depends almost entirely on the mental angle (C’ 2), and if it 
were omitted the choice of any group of characters from the remaining 20 would 
lead to almost identically the same results as those derived from the group 
adopted: there would be no clear distinction between the pairs of series compared. 
The situation is changed if C’ Z is included, but it is unsuitable as a coefficient 
of racial likeness character, since it is feared that its readings for some of the earlier 
series were not found in precisely the same way as that employed later. 

The Australian series shows no low coefficient with any other, and this is 
largely due to the fact that two of its means (for zz and m,p,) are the greatest yet 
found. But the same series also has the greatest c,/ and C’ 2 and its MZ is the 
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smallest, and these three characters are not used in computing the coefficients. 
The type would almost certainly be distinguished equally clearly if the 11 re- 
maining characters were used for this purpose instead of the 10 chosen. There is 
no doubt that the coefficients would be changed to some extent if they were 
computed for a different set of characters, but it seems probable that their orders 
would be little affected, and that the unexpected results which make it necessary 
to question the utility of the method would still be found. 


A comparison of characters considered singly throws some light on the cause | 


of these unexpected results. It will be sufficient to consider the 4 English series, 
‘which give mandibular coefficients markedly different from those found for the 
corresponding, but longer, series of crania. There are 6 comparisons, based on 10 
characters, and there are only 20 of the 60 «’s greater than 4. An «is approximately 
ithe square of a quantity which is the difference of two means divided by the 
standard error of the difference, and it would be expected to show some values 
greater than 4 in a set of 60 comparisons merely as the result of chance, if in fact 
all the series represented the same population. Only 7 of the «’s are greater than 
8, the largest being 22-9 and the next largest 17-9. For these four series there are 
very few differences which are markedly significant. There is sufficient evidence 
to show that some pairs of the types do differ significantly, but it is also clear that 
the estimates of divergence in type provided by the coefficients are likely to be 
particularly unreliable owing to the influence of errors of random sampling. For 
the material available errors of this kind may be large enough to obscure the 
situation. This seems to be a possibility, and in view of it our general con- 
clusion must be not that coefficients of racial likeness based on measurements 
of series of mandibles are incapable of revealing racial relationships, but tit 
longer series than some of those used above will be needed in order to examine the 
use of the method applied to such material. Short series—made up by fewer than 
40 bones, say—will certainly not give what is needed. 


9. Conclusions. This paper presents the results of a statistical treatment of 
two English (male and female), a Punjabi (male only) and an Australian (male and 
female) series of mandibles. Measurements were taken in accordance with the 
biometric technique, and estimates of their accuracy were obtained by repeating 
a number and comparing the distributions of first and second readings. The two 
English series are not associated with individual crania or other parts of the 
skeleton, and the problem of sexing these is discussed. It is shown that a crude 
mathematical method and anatomical appreciation agree in about 85 per cent. 
of cases, and there is reason to believe that the sexes finally adopted give the 
same order of accuracy as those obtained by sexing a series of crania anatomically. 
The constants of variation reveal a few significant, but very small absolute, 
differences between the variabilities of the different series available, and this 
conclusion is the same as that derived from cranial measurements. At the same 
time the mandible tends to be rather more variable, relative to size, than the 
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cranium. Special topics discussed are intra-racial correlations of the measure- 
ments, asymmetry and records relating to the teeth lost before death. 

Racial comparisons are made by using the method of the coefficient of racial 
likeness. In all there are 17 male and 9 female series which can be used for this 
purpose, though several of these are evidently too small to be of any permanent 
value by themselves. The coefficients show a number of entirely unexpected 
resemblances and divergences, and it is clear that they do not provide a rational 
classification of the types. In general they differentiate the series far less effectively 
than do the corresponding cranial coefficients. In 6 cases out of 136 comparisons 
there is no evidence of a significant difference judging from the mandibular 
measurements, although the series would be expected to represent quite distinct 
races and the corresponding cranial coefficients indicate clear divergence. It is 
shown that this result is not due to an unsuitable choice of the characters used, 
and that it cannot be «tributed, as far as can be seen, to the disturbing influence 
of any of the assumptions made, or devices used, in computing the coefficients. 
The failure of the method may well be due to the fact that short series of mandibles 
are not capable of providing a reliable classification of the races they represent. 
The longer series available do give suggestive results, but there are not enough 
of them to suggest that additional long series will probably do the same. We 
can assert that series made up by 40 or fewer individuals will not give the 
information required, and for such the lack of statistical distinction between two 
types cannot be supposed sufficient evidence of racial identity. Series made up 
by 40-50 individuals may be sufficiently long, but in further investigations on 
the same lines it would be safer to exclude all composed of fewer than 50 bones. 
It is quite likely that it will be possible to demonstrate the utility of the method 
when it is applied to sufficiently long series. 


I wish to thank Dr Morant for the photographs reproduced, and Miss A. B. 
Clements for typing the manuscript of this paper. 


DESCRIPTION OF PLATES 


Plates I, [1 and III show standard aspects of typical male mandibles, the focal plane of the 
camera having been perpendicular or parallel to the standard horizontal plane of the bone. In 
these cases a lens with a long focal length was used, and the distance from lens to object was about 
24 metres. The small images obtained were enlarged in printing, and the prints are reproduced 
here approximately at 0-9 natural size. Distortion may be considered negligible. The photographs 
reproduced in Plates [IV and V were taken with a lens having a shorter focal length and at a 
closer distance. The typical male mandibles were selected by considering the deviations of the 
measurements of shape (angles and indices) for each bone of a series from the means for the series 
in terms of the standard deviations for each of these measurements. Each of the three bones has 
every index and angle differing from the mean for the series to which it belongs by less than 1-2 
times the standard deviation of the measurement. Also, their maximum breadths (bicondylar, w,), 
lengths (total projective, ml) and heights (projective height of coronoid process, c,/) fall within the 
same range, except that the coronoid height of the selected Farringdon Street mandible differs 
from the mean for the male series by an amount which is 1-7 times the standard deviation of the 
distribution. Bones which are more typical than the:three shown could not be found in the short 
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series available, but it should be realized that a comparison of these may suggest that there are 
differences in metrical characters, or anatomical details, which would not be found if truly typical 
specimens—i.e. ones representing the averages in all respects—were available. 


Plate I. Typical Punjabi (above, No. 6-3616) and Australian (No. 20-3003) male mandibles: 
norma verticalis. These two show little difference in size: for the true racial types the Australian 
seen from this aspect would show a rather larger excess in size over the Punjabi. There are clear 
differences in massiveness, and in the ways in which the teeth are set in the bones. 


Plate II. Typical English (A, Farringdon Street, No. 622), Punjabi (B, No. 6-3616) and Aus- 
tralian (C, No. 20-3003) male mandibles: norma lateralis. The mandibular angles are seen to be 
very close and the means for the three series are closer still. For the true types the breadth of the 
ramus relative to its length would distinguish the Australian from the other two rather more 
clearly than is the case for the selected specimens. The lesser projection of the chin is the most 
striking characteristic of the Australian mandible, and this is also characteristic of the series. 


Plate III. Typical English (A, Farringdon Street, No. 622), Punjabi (B, No. 6-3616) and 
Australian (C, No. 20-3003) male mandibles: norma frontalis. The differences in size are small, but 
the Australian is clearly the most massive bone and the setting of the teeth in it is characteristic. 

Plate IV. Contrasted forms of Australian mandibles. 

A. Male bones with extreme mental angles; 0-7 natural size. The mandible on the left (No. 20-59) 
has the lowest mental angle (C’ 7 =65°-5) for the series, and the one on the right (No. 20-6213) 
the highest (92°-5). The mean angle is 78°-0 and the typical male (Plate IT C) has a reading of 
81°-5. 

B. Female bones with extreme mental angles: 0-8 natural size. The mandible on the left (No. 
20-6202) has the lowest mental angle (71°-0), and the one on the right (No. 20-8461) the highest 
(94°-0). 

C. The dental arcades of two male Australian mandibles of contrasted forms: 0-9 natural size. The 
specimen on the left (No. 20-6211) has a parabolic arch and that on the right (No. 20-8521) differs 
from it in having the front teeth (incisors and canines) almost in a straight line. The difference is 


seen to be dependent more on the inclinations of the incisors than on the positions of their 
sockets. 


Plate V. Pathological and anomalous Australian mandibles. 


A. A female mandible showing marked erosion of the angles due to syphilis: No. 3955-2, 0-9 natural 
size. 


B. A male mandible showing severe healed injury of the right ramus: No. 20-8562, 0-9 natural size. 


Q 


- A male mandible of a remarkably massive and primitive type: No. 20-8551, 0-9 natural size. 


. A male mandible showing gross overcrowding of the incisors: No. 20-7702, 1-3 natural size. 


) 
: 
5 
| 
| 
. 
~- 


Biometrika, Vol. ¥ XIX, Parts I and II 
Cleaver, Biometric Study of the Human Mandible 


Typical Punjabi (above) and Australian Male Mandibles. 


lf. 
4 
Le 


\ 
| 
i 
fi 
| 
j 
\ 
| 
\ 
| 
> 
ae 
Pets 
i 
a 


Biometrika, Vol. XXIX, Parts I and II Plate II 
Cleaver, Biometric Study of the Human Mandible 


4 4 
} 
| 
| 
| 
: 
| 
| 
Typical English (A), Punjabi (B) and Australian (C) Male Mandibles. 
4 i 


H 
ok q 
' 
4 
i 
A 
i | 
| 
j 
} 
iv 
4 
| 
| 
"a } 
if 
| 


~ 


Biometrika, Vol. XXIX, Parts I and II 
Cleaver, Biometric Study of the Human Mandible 


Plate III 


Typical English (A), Punjabi (B) and Australian (C) Male Mandibles. 
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Plate IV 


C. Different forms of the dental arcade. 


Contrasted Forms of Australian Mandibles. 
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A BIOMETRIC STUDY OF THE HUMAN MALAR BONE 
By T. L. WOO, Px.D. 


1. Introduction. The measurements of the skull which have been most widely 
used for anthropological purposes were originally defined, or elaborated from 
definitions of earlier workers, by Paul Broca and a number of his German 
contemporaries. The French and German techniques were by no means identical, 
but the two sets of measurements corresponded in a general way. They aimed 
at giving a general description of the cranium considered as a whole and of all 
its principal parts. The framers of the techniques were primarily anatomists 
who had become interested in anthropological problems, but there are no 
peculiarly anatomical considerations underlying their systems. In particular 
there was, in the case of the majority of the measurements, an unfortunate 
disregard of the fact that the skull is made up of a considerable number of 
different bones. Nearly all the later craniometric techniques are based on the 
earlier ones, and their general aim has been to secure greater precision and 
standardization. The result has been that a particular set of measurements has 
been given in a large number of publications for some tens of thousands of skulls 
representing extinct and existing races in all parts of the world. The value of 
this corpus of material is beyond question and, in fact, it is by far the most 
valuable material available at present which can be used to estimate with 
precision the resemblances of different varieties of man. It is known that ail 
the usual measurements make some clear distinctions when the averages for 
different series are compared: in other words, they are all of racial significance. 
But it is also known that their relative values for the purpose of differentiating 
races differ greatly. Some appear to be almost constant for all races, while 
others usually show significant differences, and it’ is found that there is 
something like a gradual transition between these extremes. This grading of 
the characters, in order of their effectiveness as racial criteria, could not be 
appreciated until extensive data had been collected for them. 

The position with regard to the customary measurements suggests that it 
should be possible to select a smaller number of characters which could be used 
as, or more, effectively for purposes of racial classification, with less labour 
involved in recording and computing. The list chosen might be made up partly 
by some of the old measurements and partly by new ones. It is generally 
recognized that certain features of the cranium which are obviously of value in 
aiding racial discrimination are not estimated by any of the classical measure- 
ments. New measurements of the “flatness” of the facial skeleton were taken 


on nearly 6000 skulls, representing a number of races from different parts of the 
Biometrika xx1x 8 
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world, with the object of examining the value of one such feature.* It was con- 
cluded that a few of these are as useful for the purpose it: view as any other 
characters that have been dealt with metrically, and far more useful than some 
for which extensive records are available. 

The investigation described in the present paper was undertaken in the hope 
of discovering other new measurements which might be of exceptional value in 
aiding racial classification. In a paper published in 1931} the writer gave 
definitions of 25 chords and arcs, of which the majority were new, each being 
confined to a single bone of the skull. These were taken on a series—the 
Egyptian H—of nearly 900 male specimens obtained from a single cemetery at 
Gizeh which was used from the 26th to the 30th dynasties. Among the measure- 
ments dealt with in this study are two of the malar bone, the horizontal arc and 
the vertical are. These were determined for both malar bones, so that the 
question of asymmetry could be investigated,{ but no material was collected 
then to throw light on the possible sexual and racial significance of the measure- 
ments in question. One of the arcs was taken later by Dr von Bonin on a series 
of New Britain skulls,§ and the material for them presented below relates to an 
additional 710 crania, made up by 14 male and 2 female series representing 
races in different parts of the world. Two additional measurements of the malar 
bones of these 710 specimens were also recorded. 

Anthropologists have hitherto devoted little attention to the malar bone, 
though a number of scattered remarks relating to racial differences in its size 
and form may be found in the literature. The earlier discussions of its metrical 
and anatomical variations are conveniently summarized by Le Double.|| He 
remarks: “Il n’est pas démontré péremptoirement encore, par des mensurations 
multiples et précises, que le malaire ait, toutes choses égales d’ailleurs, des 
dimensions plus considérables dans une race que dans une autre et, dans une 
race quelconque, chez ’homme que chez la femme.” 


2. The material measured. All the skulls for which measurements of the 
malar bones are given for the first time in this paper are in the Museum of the 
Royal College of Surgeons, London. The writer measured them there in 1934 
and he is greatly indebted to the authorities of the College, and particularly to 


Miss M. L. Tildesley, for granting him ready access to the specimens. The 
series are: 


(i) English. 43 3. These came from a single cemetery at Portugal Street, 


* T. L. Woo and G. M. Morant, ‘A Biometric Study of the ‘Flatness’ of the Facial Skeleton 
in Man”, Biometrika, xxv1 (1934), pp. 196-250. 

+ T. L. Woo, “On the Asymmetry of the Human Skull”, bid. xx, pp. 324-52. 

{ These malar bone measurements for the Egyptian series and the index derived from them are 
also treated by Karl Pearson and T. L. Woo in “Further Investigation of the Morphometric 
Characters of the Human Skull”, /bid. xxvii (1935), pp. 424-65. 

§ “On the Craniology of Oceania. Crania from New Britain”, bid. xxvut (1936), pp. 123-48. 

|| T'raité des Variations des Os de la Face de V Homme (1906), pp. 114-65. 
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London and they are known as the King’s College series. The skulls are probably 
of eighteenth-century date. 


(ii) French. 283. Most of these came from the catacombs of Paris and 


they are all later than the Merovingian period. Measurements were only taken 
of the complete crania. 


(iii) Italian. 923. These modern skulls came from 12 provinces in the 
northern and central parts cf Italy. 

(iv) Egyptian: dynastic. 263. These belong to middle and late dynastic 
times. Measurements were only taken of the complete crania. 

(v) Lgyptian: Ptolemaic and Roman. 313. Measurements were only taken 
of the complete crania. 

(vi) Negro: Nigeria. 41 3. Of this total 34 skulls came from South Nigeria 
—representing mainly the Ibibio and Ekoi tribes of the Calabar region—and the 
others are from different parts of North Nigeria. 

(vii) Negro: Congo. 36g and 219. The majority of these specimens repre- 
sent the Batetela tribe who live near the Lubefu River. 

(viii) Hindu: Bihar and Orissa. 363. Several castes of Hindus are repre- 
sented, and the majority of the specimens came from the Patna district in the 
north-west of Bihar. 


(ix) Punjabi. 803. These crania are of Mohammedans and several castes 
of Hindus. 


(x) Javanese. 453. These came from various parts of Java and the 
neighbouring islands. 

(xi) Chinese. 633. Nearly half of these specimens are known to have come 
from various localities on the south-east coast of China, and the majority of the 
others probably came from the south of the country. 

(xii) Eskimo. 293. These came from various parts of Greenland and 
neighbouring islands to the west. 

(xiii) Maori: New Zealand. 393. The majority of these specimens came 
from the North Island, principally from the vicinity of Auckland, but some are 
from unknown localities. 

(xiv) Kanaka: 503 and 509. These specimens came from the Islands of 


Oahu and Hawaii, and the population of the former is better represented than 
that of the latter. 


Every one of these 639 male and 71 female skulls is sufficiently complete to 
give ail the measurements defined in the following section. 


3. Definitions of measurements of the malar bone. Fig. 1 shows the left 
malar bone and surrounding regions of the facial skeleton: FMT is the point 
where the malar ridge crosses the fronto-malar suture, and this is practically 
the same as Martin’s fronto-malare temporale; ZM is the lowest point on the 
malar-maxillary suture, so it is his zygomazillare. The other two points used are 
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not defined by Martin. O is the point where the malar-maxillary suture crosses 
the lower margin of the orbit, and ZT is the lowest point on the zygomatic 
suture which is still on the lateral surface of the arch. The measurements are: 

(a) Ml,=minimum horizontal arc from O to ZT’. 

(6) Ml,=minimum vertical arc from FMT to ZM. 

(c) 100 M1,/M1,. 

These three are the malar-bone measurements of the earlier studies of measure- 
ments of single bones of the cranium. They are available for the long Egyptian 
series of male skulls* and for all the new material. M1, is also available for the 
new British series. The arcs, taken with a steel tape, are recorded to the nearest 
0-5 of a mm. 


Fig. 1. The left malar bone and surrounding region, showing 
measurements taken. 


(d) C=chord between the terminals of the horizontal arc (O and Z7'). 

(e) S=maximum subtense from this chord to the line marking the direc- 
tion of the minimum horizontal arc (M1,). This line is first marked in pencil 
on the surface of the bone. 

'(f) 100 8/C. This provides a measure of the curvature of the horizontal 
are. 


These three measurements are only available for the new material. 

The chord and the subtense were taken at the same time with the aid of a 
pair of co-ordinate callipers which was made for the writer by W. F. Stanley and 
Co. (London). This is similar in construction to the co-ordinate callipers made 
by P. Hermann, Rickenbach u. Sohn (Ziirich), which could be used for the 
purpose, but the subtense arm of the new form terminates in a narrow straight 


* An error was made in the tables of the asymmetry paper cited: the symbols Ml, and M1, 
should be interchanged in these. 
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TABLE I. Mean malar-bone measurements and their probable errors 


Horizontal are (M1,} Vertical arc (Mi,) 
Series Sex No. 
R L R L 
Eskimo 3 29 70-5 +-75 71-04-65 52-6 + -48 52-24-47 
Chinese 3 63 62-7 +-35 62-8 +. -33 49-7 +-23 50-14-24 
Javanese 45 61-3 +-42 60-8 + -40 50-44-35 50-5 +-37 
Kanaka 3 50 2-3 62-5 +-36 49-7 4-25 49-7 4--25 
Maori 3 39 62-6 + -46 62-54-41 49-8 +-34 49-9 + -39 
Negro: Nigeria 3 41 60-8 +--46 61-6 +-49 49-2 +-29 49-44-33 
Negro: Congo 3$ 36 58-5 4-55 59-14-51 47-84-31 48-0 + -36 
Egyptian: dynastic* 3 26 58-3 +-60 58-64-49 47-34-41 47-74-41 
Egyptian: 26th-30th dynasties 3 716, etc 59-4 +-110F 59-6 +-L15t 49-4 +4 50-0 + -076 
Egyptian: Ptolemaic and Roman 3 31 58-44-48 59-04-47 47-6 +-36 48-24-40 
Hindu: Bihar and Orissa 36 55-6 56-54-35 45-8 45-9 +-33 
Punjabi 3 80 57-6 58-24-33 47-34-20 47-54-25 
Italian 3 92 58-9 + -26 59-0 4-27 48-04-21 48-1 +-22 
English 3 43 58-84-40 59-14-39 49-8 4-25 50-24-30 
French 28 57-94-45 57-9+-51 48-3 +-38 48-7 +-43 
Kanaka 50 57-9 +. -36 5854-38 46-2 +-25 45-9 + -26 
Negro: Congo 21 57-24-59 57-3 +.-60 46-5 +-41 46-5 
Horizontal chord (C) Sudtense to chord (8) 
Series Sex No. 
R L R L 
Eskimo 3 29 62-44-59 62-1 -4-59 14-54-24 14-7 +-30 
Chinese 3 63 55-3 4°30 54-9 + -26 12-54-14 12-54-14 
Javanese 3 45 54-34-40 53-9 +36 12-14-17 11-94-17 
Kanaka 3 50 55-84-31 55-6 4-35 11-94-14 1204-14 
Maori 3 39 56-0 +.-39 55-3 +.-37 11-74-11 11-8+-16 
Negro: Nigeria 3 41 54-64-41 54-54-43 11-74-14 11-34-12 
Negro: Congo 3 36 52-14-44 52-3 4-46 10-8 +-16 10-6 +-15 
Egyptian: dynastic* 3 26 52-54-53 52-5 +. -46 10-04-15 10-1 -}-12 
Egyptian: Ptolemaic and Roman 31 52-54-36 53-0 10-9 4-17 10-9 +-18 
Hindu: Bihar and Orissa 3 36 49-8 + -30 50-14-29 10-1 4-12 10-2+-10 
Punjabi 3 80 51-7 51-84-28 10-44-12 10-54-11 
Italian 3 92 53-5 4-24 53-24-25 9-6 +-08 9-8+-09 
English 3 43 53-7 +. -33 53-4 +-32 9554-11 934-13 
French 3 28 52-9 + -38 52-5 4-39 9-34-15 9-24-15 
Kanaka 50 52-14-31 52-04-31 10-84-18 10-9 +-13 
Negro: Congo 2 21 51-24-51 50-8 +-50 10-6 +-21 10-5 +-22 
* New Empire and late dynastic. + For 718 skulls. ¢ For 817 skulls. § For 716s 
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1. Mean malar-bone measurements and their probable err 


Horizontal are (M1,) Vertical arc (M1,) 
No. 
R Pe R L 

29 10-5 71-0 4-65 52-6 52-2 

63 62-7 62-8 + -33 49-7 4-23 50-1 +: 

45 61-3 +-42 60-8 + -40 50-44-35 50-5 +: 

| 62-3 62-5 +-36 49-7 4-25 49-7 + 
62-6 + -46 62-5 +-41 49-8 +-34 49-9 + 
60-8 +-46 61-6 +-49 49-2 +-29 49-4 
| 58-5 +-55 59-14-51 47-84-31 48-0 + 
| 26 58-3 +-60 58-6 1-49 47-34-41 47-74 
| Th6,etc.| 59-44-1107 59-6+4-115t | 49-4 4-073t 50-0 + 
44-48 59-0 +-47 47-64-36 48-2 4 
55-6 +-34 56-5 4-35 45-8 +-32 45-94 
| 80 57-6 58-2 4-33 47-3 4-20 47-5 + 
92 58-9 + -26 59-0 +-27 48-0 +-21 48-14 

43 58-8 +-40 59-1 1-39 49-8 4-25 50-2 + 

2 57-9 4-45 57-9 48-34-38 48-7 4 

50 57-94-36 58-54-38 46-2 +-25 45-9 + 
aes 57-2 4-59 57-3 +-60 46-5 +-41 46-5 + 
Horizontal chord (C) Subtense to chord (4 

No. 
R R i 

29 62-44-59 62-1 14-5 4-24 14-74 

63 55-3 54-9 + -26 12-5 4-14 12-5 + 

45 54-34-40 53-9 12-1 4:17 11-9 4 

50 55:84:31 55-6 11-94-14 12-0 4 

39 56-0 +-39 55-3 +-37 11-74-11 11-8 4 

41 54-64-41 54-5 +-43 11-74-14 11-3 4 

36 52-14-44 52:3 +-46 10-8 +-16 10-6 + 

26 52-5 +53 52-5 +-46 10-04-15 10-1 4 

31 52-5 +36 53-0 10-9 4-17 10-9 4 

36 49-8 +-30 50-1 4-29 10-1 4-12 10-2 4 

80 51-7 51-84-28 10-44-12 10-5 4 

92 53-5 4-24 53-2 9-6 9-84 

43 53-7 +-33 53-4 4-32 9-55 4-11 9-34 

28 52-9 +--38 52-5 4-39 9-3 4-15 9-24 

50 52-14-31 5204-31 10-84-13 10-94 

21 §1-2+-51 50-8 +-50 10-6 +-21 10-54 

ind late dynastic. + For 718 skulls. { For 817 skulls. 
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probable errors 
rtical arc (Ml,) 100 
L R L 
-48 §2-2+4-47 74-8 +-69 73-44-73 
23 50-1 +-24 79-44-53 80-0 +-51 
“35 50-5 +-37 82-44-58 83-3 +-69 
+25 49-7 4-25 79-8 +-47 79-6+-49 
49-94-39 79-7 +-57 80-0 +-60 
-29 49-4 4-33 81-3 +-59 80-4 + -66 
31 48-0 + -36 82-2 +-69 81-14-71 
47-7+-41 81-34-69 81-94-70 
-O73t | 50-0+4-076¢ 83-2 +-1488 84-0 +-152§ 
36 48-24-40 81-93-71 82-5+-78 
“32 45-9 +-33 82-44-62 81-44-59 
47-5 +.-25 82-5+-49 81-94-49 
“21 48-14-22 81-7+-38 81-8+-40 
25 50-2 +-30 85-0 +-43 85-3 +-52 
48-7 +-43 83-6 +-81 84-5+-80 
+25 45-9 4-26 79-8 4-37 78°7 +44 
46-5 4-35 81-6 81-5 4-80 
tense to chord (8) 100 S/C 
L R L 
24 14-7 +-30 23-2 +--26 23-6 +-29 
14 12-54-14 22-5 4-21 22-7 4-23 
“17 11-94-17 22-2 22-0 4-25 
‘14 12-0 4-14 21-44-21 21-7 +-23 
ll 11-8 +-16 20-9 +-20 21-34-24 
14 11-3 4-12 21-44-23 20-8 +-20 
16 10-6 +-15 20-7 +-23 20-3 +-21 
15 10-1 +-12 19-1 +-25 19-3 +-24 
17 10-9+-18 20-9 +-26 20-5 +-27 
12 10-2+-10 20-4 4-24 20-4 +-20 
--12 10-5+-11 20-14-18 20-14-17 
--08 9-8+-09 18-0 +-14 18-44-14 
9-34-13 17-8 +-23 17-54-21 
--15 9-2+-15 17-7 +-25 17-4+-20 
--13 10-9+-13 20-7 +-22 20-8 +-19 
+21 10-5 4-22 20-7 +:-35 20-7 +-35 


skulls, § For 716 skulls. 
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TABLE II. Variabilii 


Horizontal are (J 
Series Sex| No. 
R L R 
Eskimo 3 29 6-00 +-53 | 5-21+4-46 | 8-51 - 
Chinese 3 63 4-07 +-25 | 3-86+-23 | 6-49- 
Javanese 3 45 4:18+-30 | | 6-82 - 
Kanaka 3 50 3-52 +-24 | 3-744-25 | 5-65- 
Maori 3 39 4-29 +-33 | 3-83+-29 | 6-86 - 
Negro: Nigeria | 41 | 4404-33 | 4654-35 | 7-24 
Negro: Congo 36 4894-39 | 4554-36 | 8-35- 
Egyptian: dynastic* 3 26 4-53 +-42 | 3-714-35 | 7-76- 
Egyptian: 26th-30th dynasties $ 4384-08 | 4-55+-08 | 7-37- 
Egyptian: Ptolemaic and Roman 3 31 3-99 +-34 | 3-90+4-33 | 6-83 - 
Hindu: Bihar and Orissa 3 36 2-99 +-24 | 3-09+-25 | 5-38 - 
Punjabi 3 80 4324-23 | 4374-23 | 7-51- 
Italian 3 92 3-704+-18 | 3-83+4-19 | 6-28 - 
English 3 43 3-91+-29 | 3-76+-27 | 6-65- 
French 3 28 3-55 +-32 | 3-99+-36 | 6-14- 
Kanaka | Q 50 3-75 +-25 | 3-94+-27 | 6-47- 
Negro: Congo Q 21 4-02 4-42 | 4-:10+-43 | 7-03 - 
Horizontal chord 
Series Sex} No. Coet 
R L R 
Eskimo 29 4:72+4-42 | 4744-42 | 7-56- 
Chinese 3 63 3-50 +-21 | 3-10+-19 | 6-32- 
Javanese 3 45 3:99+-28 | 3-56+-25 | 7-34- 
Kanaka 3 50 3-21+-22 | 3-69+4-25 | 5-76. 
Maori 3 39 3°62 +-28 | 3-39+-26 | 6-47. 
Negro: Nigeria 3 41 3°86 +-29 | 4-04+4-31 | 7-07. 
Negro: Congo 3 36 3-96 +-31 | 4-09+-33 | 7-60 
Egyptian: dynastic* 3 26 3°98 +-37 | 3-46+-32 | 7-58 
Egyptian: Ptolemaic and Roman | ¢ 31 3-00 +-26 | 3-204-27 | 5-71 
Hindu: Bihar and Orissa 3 36 2-70 +-22 | 2-56+-20 | 5-43 
Punjabi 3 80 3:72 +-20 | 3-76+-20 | 7-19 
Italian 3 92 3:37+-17 | 3-50+-17 | 6-31 
English 3 43 3-19 +-23 | 3-08+-22 | 5-93 
French 3 28 3-00 +-27 | 3-05+-27 | 5-67 
Kanaka 9 50 | 3-274-22 | 3-244-22 | 6-27 
Negro: Congo g 21 3-45 +-36 | 3-42+-36 | 6-75 


* New Empire and late dy 
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I. Variabilities of malar-bone measurements 7 


Horizontal arc (M1,) 


Vertical a 


Coefficient of varia- 
tion 

L R L R L 
+-46 | 8-51+-76 | 7-334-65 | 3814-34 | 3-73+-33 
3-86 +-23 | 6-494-39 | 6-144-37 | 2-66+-16 | 2-82+4-17 
-00+-28 | 6-82+4-49 | 6-58+-47 | 3-48+-25 | 3-66+-26 
3-744-25 | 5654-38 | 5-98+-40 | 2-65+-18 | 2-66+-18 
3-83 +-29 | 6864-53 | 6-13+-47 | 3-17+-24 | 3-614-28 
654-35 | 7-24+-54 | 7-55+4-57 | 2-764-21 | 3-10+-23 
1-554+-36 | 8-35+-67 | 7-70+-62 | 2-77+-22 | 3-18+-25 
3°71 +-35 | 7-76+-73 | 6-32+-59 | 3:10+-29 | 3-11+-29 
1-55+-08 | 7°37+4-13 | 7-64+-14 | 3-:11+-05 | 3-214-05 
3-90 | 6-83+4-59 | 6-61+-57 | 3-01+4-26 | 3-28+-28 
3-09 +-25 | 5384-43 | 5-47+-44 | | 2-97+4-24 
4-37 +-23 | | 7-51+-40 | 2:704+-14 | 3-34+-18 
$-83+-19 | 6-28+-31 | 6-49+-32 | 2-954-15 | 3-074-15 
3-76 +-27 | 6-65+-49 | 6-36+-46 2404-18 2-96+-22 
3-99 | 6-14+4-56 | 6-90+-62 | 3-01+4-27 | 3-40+-31 
3944-27 | 6-47+-44 | 6-73+4-46 | 2-62+4-18 | 2-69+-18 
4:10+-43 | 7-°03+-74 | 7-16+-75 | 2-77+-29 | 2-40+4-25 
Horizontal chord (C) Subtense to ¢ 

Coefficient of varia- 
tion 

L R | L R I 
4744-42 | 7-56 +-67 | 7-63 +-68 | 1-89+-17 | 
3°10+°19 | 6324-38 | 5-64+4-34 | 1-61+-10 1-70+-10 
3°56 | 7-°34+4-52 | 6-60+-47 | 1-67+-12 | 1-66+-12 
3°69 +25 | 5°76+-39 | 6-64 +:45 | 1-43+-10 | 1-51+-10 
3394-26 | 6-47-50 | 6-134-47 | 1-044-08 | 1474-11 
4-044+-31 | 7-°07+-53 | 7-40+-55 | 1-36+-10 | 1-:16+-09 
4-09 +-33 | 7-60+-61 | 7-82+-63 | 1-44+-11 | 1:36+-11 
3-46 +-32 | 7-58+-71 | 6-60+-62 | 1-:17+-11 | 0-89+-08 
3°20 +-27 | 5-714-49 | 6-:05+-52 | 1364-12 | 1-49+--13 
2-56 +-°20 | 5-43+-43 | 5-11+-41 | 1-:10+-09 | 0-93 +-07 
3-76 +-20 | 7-19+-39 | 7-26+-39 | 1-57+-08 | 1-48+-08 
3-50 | | 6-58+-33 | 1:20+-06 | 1-32+-07 
3-08 +-22 | 5-93+-43 | 5-78+-42 | 1-27+-09 | 1:05+-08 
3-05 +-27 | 5-67+-51 | 5-81+-53 | 1-17+4-11 | 1-15+-10 
3-244-22 | 6-274-42 | 6-224-42 | 1394-09 | 1-374-09 
3-424-36 | 6754-71 | 6-74-4-70 | 1-45-4-15 | 1-524-16 


npire and late dynastic. 


+ See Table I. 
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ABLE II. Variabilities of malar-bone measurements 


Horizontal are (M1,) Vertical are (M1,) 100 Mi,/M1, 
Coefficient of varia- Coefficient of varia- 
tion tion 
R | L R L R L R L R L 

+-53 | 5-21+4-46 | 8-51+-76 | 7:°33+-65 | 3-81+-34 | 3-73+-33 7-25 +0-65 7-15+0-64 | 5-54+-49 | 5-80+-51 
17+-25 | 3-86+-23 | 6-49+4--39 | 6-14+-37 | 2-66+-16 | 2-82+4-17 5-36 +0-32 5-62 40-34 | 6-29+4-38 | 5-95+-36 
8+-30 | 4-00+-28 | 6824-49 | 6-58+-47 | 3-48+-25 | 3-66+-26 6-91 +0-49 7:26 +0-52 | 5-80+-41 | 6-83 -+-49 
-24 | 3-74+4-25 | 5-65+-38 | 5-98+-40 | 2-65+-18 | 2-66+4-18 5-32 40:36 5-36 +0-36 | 4-:96+-33 | 5-18+.-35 

+33 | 3-83+-29 | 6-86+-53 | 6-13+-47 | 3-17+-24 | 3-61+4-28 6:36 +0-49 7-24+40-56 | 5-31+4-41 | 5-60+-43 

4-65+4+-35 | 7-24-+-54 | 7-55+4-57 | 2-76 +-21 3-10 +-23 5-61 +0-42 6-28 +0-47 | 5-58+-42 | 6-224 -46 
89+-39 | 4554-36 | 8-35+-67 | 7-70+-62 | 2-77+-22 | 3-18+-25 5-80 +0-46 6-62 +0-53 | 6-13 -+-49 | 6-35+--50 
+-42 | 3-714-35 | 7-764-73 | 6-32+-59 3°10+-29 | 3-114-29 6-56 +0-62 6-52+0-61 | 5-18+-48 | 5-25+-49 

38+ -O8 | | | 7-64+-14 | 3-114-05 | 3-21+-05 6-30 +0-11 6-42+0-11 | 5-85+-10 | 6-02+-11 
+-34 | 3-90+-33 | 6834-59 | 6-61+-57 | 3-01+4-26 | 3-28+-28 6-32 +0°54 6-79 40-58 | 5-86+-50 | 6-40+-55 
99 +-24 | 3-09+-25 | 5384-43 | 5-47+-44 | | 2-97+4-24 6-26 +0-50 6-47 40-52 |. 5-53 +-44 | 5-23 +-42 
32 ++ -23 | 4-37-4-23 | 7-514-40 | 7-51 | 2:70+-14 | 3-34+-18 5-70 +0-30 7-04+0-38 | 6-56+-35 | 6-55+-35 
704-18 | 3-83+-19 | 6-28+-31 | 6-49+-32 | | 3-07+-15 6-15 +0-31 6-38 | 5:-40+-27 | 5-64+-28 
91+-29 | 3-76+-27 | 6-65+-49 | 6-36+-46 | 2-40+-18 | 2-96 +-22 4-82 10-35 5-39 40-43 | 4-18+-30 | 5-02+-36 
+-32 | 3:99+-36 | 6-14+-56 | 6-90+-62 | 3-01+-27 | 3-40+-31 6-23 -40-56 6-98 +0-63 | 6-39+4-58 | 6-28 +-57 
75+-25 | 3-94+4-27 | 6-47+4-44 | 6-73+4-46 | 2-62+4-18 | 2-69+-18 5-68 10-38 5-86+0-40 | 3-92+4-26 | 4-56+-31 
m2 +-42 | 4:10+-43 | 7-03+4-74 | 7-16+°75 | 2-77 +-29 | 2-40+4-25 5-96 +0-62 5-16 40-54 | 5-62+4-59 | 5-40+-56 

Horizontal chord (C) Subtense to chord (8) 100 S/C 
o Coefficient of varia- Coefficient of varia- 
tion tion 
R L R L R L R L R L 

72+-42 | 4-744 -42 | 7-56+-67 | 7-63+-68 | 1-89-+-17 | 2-36+4-21 | 13-05+41-18 | 16-03+1-46 | 2-09+-19 | 2-30+-20 
0 +-21 | 3-10+-19 | 6-32+-38 | 5-64+-34 | 1-61+-10 1-70 +-10 | 12-84+0-78 | 13-60+0-°83 | 2-41+4-15 | 2-66+-16 
9+-28 | 3564-25 | 7:34+4-52 | 6-60+4-47 | 1-67 +-12 1-66 +--12 | 13-72+0-99 | 14-06+1-02 | 2-54+-18 | 2-47+-18 
-22 | 3-69+-25 | 5-76+-39 | 6-64 +:45 1-43+-10 | 1-51+-10 | 12-03+0-82 | 12-55 +0-86 | 2-15+-15 | 2-41+4-16 

32 4+-28 | 3-39+-26 | 6-47+-50 | 6-13 +-47 1-:04+-08 | 1-47+-11 8-86 +0-68 | 12-48+0-97 | 1-86+-14 | 2-22+-17 
6+-29 | | 7-07+-53 | 7-40+-55 1-36+-10 | 1-:16+-09 | 11-64+0-88 | 10-27+0-77 | 2-:14+-16 | 1884-14 
+-31 | 4-09+-33 | 7-60+-61 | 782+-63 | 1-44+-11 1-36 +-11 | 13-3141-08 | 12-84+41-04 | 2-02+-16 | 1-91+-15 
‘37 | 3-464-72 | 7-58+-71 | 6-60+-62 1-:17+-11 | 0:89+-08 | 11-67+1-11 8-78 +0-83 | 1:86+-17 | 1:80+-17 

10 +-26 | 3-20+ 5°71+-49 | 6-05+4-52 | 1364-12 | 1-49+-13 | 12-50+1-09 | 13-67+1-19 | 2-144-18 | 2-20+-19 
--22 | 2-56+-20 | 5-43+-43 | 5-11+4-41 | 1-:10+-09 | 0-93 +-07 | 10-84 -+0-87 9-08 +0-73 | 2-10+-17 | 1-75+4-14 
2+°20 | 3-76+-20 | 7-:19+-39 | 7-26+-39 | 1-57+-08 | 1-48+-08 | 15-12+0-82 | 14-14+0-77 | 2-39 +:13 | 2:25+-12 
37 +17 | | 6-31+4-31 | 6-58+-33 | 1-20+-06 | 1-32+-07 | 12-49+40-63 | 12-28 +0-62 | 2-02 +:10 | 2:04+-10 
19+-23 | 3-08+-22 | 5-93+4-43 | 5-78+4-42 | 1-27+-09 | 1-:05+-08 | 13-33-40-99 | 11-26+0-83 | 2-21 +:16 | 2:00+-15 
0 +-27 | 3-05+4-27 | 5-674-51 | 5814-53 | 1-174-11 | 1-15+4-10 | 12-57+41-15 | 12-4841-14 | 1-97 +:18 | 1574-14 
27 +22 | 3-244-22 | 6274-42 | 6-224-42 | 1394-09 | 1-374-09 | 12-834.0-88 | 12-57+0-86 | 2-30+4-16 | 2-03 4-14 
5+-36 | 3424-36 | 6-754-71 | 6-744-70 | 1-454-15 | 1524-16 | 13-66-41-45 | 14-52 41-54 | 2-35 4-25 | 2-394-25 

| 


New Empire and late dynastic. 


+ See Table I. 
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edge instead of a sharp tip. This makes it possible to determine the maximum 
subtense by merely bringing the edge down on to the arc, marked in pencil on 
the bone, from whick the subtense is taken: there is no need to repeat this 
operation several times in order to find the maximum reading, as is necessary 
when the form of callipers with a pointed subtense arm is used. Both scales of 
the new instrument have verniers attached, and the chords and subtenses were 
recorded to the nearest 0-1 of a mm. 


4. Sexual and bilateral comparisons. The metrical material available for 
estimating sexual differences for the malar bone is very scanty. There is one 
series of 50 male and 50 female Kanaka skulls and another of 36 male and 
21 female Congo-Negro. Means are given in Table I and variebilities in Table IT. 
The numbers are too small to give reliable sex ratios (male mean/female mean) 
for the absolute measurements, but as far as can he seen these are not peculiar 
for cranial measurements. The eight constar*s range from 1-068 to 1-102 for the 
Kanaka series and from 1-010 to 1-032 for the Congo, but it would be unwise to 
assume from such slender evidence that the races are distinguished by their 
average sex differences. There is no suggestion that the true ratios are signifi- 
cantly different for different measurements, or for the right and left sides in the 
case of the same measurement. No clearly significant differences are found 
between the corresponding male and female indices, and it is clear that any 
sexual differentiation—apart from that in absolute size—which might be 
deduced from the measurements could only be revealed by data for larger 
samples. 

The data which can be used to examine bilateral differences for male 
skulls are far more extensive. In estimating the significance of such constants 
the bilateral correlations have to be taken into account, and these are given in 
Table ITI for the long Egyptian and the two longest of the new series. In the 
case of the three comparisons which can be made, the Egyptian constant is 
greater than those for the other two series, and most of the amounts by which 
its value exceeds theirs are significant. For all six characters, however, no 
significant differences are found between the corresponding Italian and Punjabi 
correlations. It is commonly found for anthropometric material that the more 
homogeneous series give the higher correlations, and this relation is observed 
in the present case. There are no marked differences between the correlations 
for different characters, except that those for the subtense and the index in- 
volving the subtense tend to be lower than the others. It may be suggested that 
this is due to the fact that readings of the subtense—quite the smallest measure- 
ment—were not recorded in sufficiently smali units to give reliable correlations. 
An examination of the constants for the Italian series shows that this is not the 
case, however. Its highest bilateral correlation is for the vertical arc. Readings 
of this measurement were taken to the nearest 0-5 of a mm. and the standara 
deviation for it is about 3 mm., so the unit of measurement is about one-sixth 
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of the oc. Readings of the subtense were taken to the nearest 0-1 of a mm. and 
the standard deviation for it is of the order 1-2 mm., so the unit of measurement 
is here about one-twelfth of the o. Merely on account of the way in which the 
measurements were taken, it might hence be expected that the arc would show 
a lower bilateral correlation than the subtense, but actually the latter has the 
lower value. 

Comparisons between the means and standard deviations for the right and 
left sides are summarized in Table IV, and the data there refer only to the 
fourteen new series. Treating each character separately, the numbers of series 
showing the right constant greater than the left, equality, and the left constant 
greater than the right are given, and also all the ratios of the differences to their 
probable errors which exceed 3-5. In calculating these ratios* the bilateral 
correlations used were: (i) in the case of M1,, Ml, and 100 M1,/ M1, those of the 
long Egyptian series in all comparisons except those for the Italian and Punjabi 
series, the appropriate correlations in Table III being used for these; (ii) in the 
case of C, S and 100 S/C, those of the Italian series in all comparisons except 
the Punjabi. It will be seen from Table IV that few markedly significant 
differences are found for any character: larger series than any dealt with there 
are generally needed to reveal beyond question the asymmetry in type of any 
cranial measurement. Considering all the series together, there is a clear sug- 
gestion that both the horizontal and vertical arcs of the malar bone tend to be 
larger in size on the left than on the right. This accords with the results obtained 
for the paired bones—approximately 800 in number—of the male Egyptian 
series E skulls for which both differences of means are significant and of the 
same sign (L>R). For the 50 male skulls from New Britain Dr von Bonin 
found the left mean of the horizontal arc 0-1 mm. greater than the right, though 
this difference is quite insignificant. The means of the other characters give no 
clear indication of asymmetry and it is curious that this should be so for the 
horizontal chord, since the are which has the same terminals shows a different 
relation. The comparisons in Table IV for standard deviations suggest that 
variability on the left side exceeds that on the right in the case of Ml, and 
100 M1,/ M1, , but that there is no bilateral difference in variability in the case of 
the other characters. For the long Egyptian series the left standard deviation 
was found to be significantly in excess of the right in the case of both M1, and 
Mi,. Of the four absolute measurements, MJ, is the only one for which there 
is a suggestion of a bilateral difference in relative variability, measured by the 
coefficient of variation. For this character the left constant exceeds the right 
in the case of 12 of the 14 short series: the long Egyptian series shows differences 
of the same sign for both Ml, and MI/,, but the former is significant and the 
latter is not. 


* The formulae which have to be used are given on pp. 329 and 337 of the writer’s paper on 
asymmetry cited above. 
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TABLE ITI. 


Bilateral correlations of measurements of malar bones: 


male series 
| 
| No Horizontal are | Vertical arc Horizontal | 
| (M1) (Ml,) chord (C) | 
| Italian 92 -7327 + -0223 -9123+-0118 | -8708+-0170 
| Punjabi 80 *8248+-0241 | -8498+-0210 | -9130+-0126 | 
| Egyptian: 26th-30th dynasties 716, ete. | -9399+-0029* | -9219 +-0035+ 
L 
== 
Subtense to 
No. chord (8) 100 Mi,/ ML | 100 S/C 
Italian 92 6644+ -0393 | -8017+-0251 | -5782 + -0468 
Punjabi 80 ‘75414-0325 | -7430+-0338 | -6970+-0366 | 
Egyptian: 26th-30th dynasties | 716, etc. | “8806 + -0057i | | 


* For 718 skulls. 


+ For 817 skulls. t 


For 716 skulls. 


TABLE IV. Bilateral comparisons of constants for malar-bone measurements : 


fourteen male series 


| Equa- | 
R>L lity L>R 
| 
No. Sionificant | No. | No. | 
of po. aly | of | of | Significant differences 
differences 
cases | cases | cases | 
| 
Means 
Horizontal are (M1,) 2 | =H 1 11 | Negro: Nigeria (4-8), Egyptian: 
| | | Ptolemaic and Roman (3-6), 
| | | Hinuu (7-5) 
Vertical are (M1,) Bod = ee 12 | Chinese (4-3), Egyptian: Pto- 
lemaic and Roman (3-9) 
100 M1,/M1, 6 | Eskimo (4-0) 0 8 — 
Horizontal chord (C) 9 | Maori (3-6) i 2. Be - — 
Subtense to chord (S) | 5 | Negro: Nigeria(3-7)| 2 | 7 --- 
100 S/C 6 | => 2 6 — 
Standard deviations 
Horizontal are (M1,) 8 igyptian: dynastic | 0 | 6 
(4-1) 
Vertical are (J/1,) 1 | Punjabi (5-1), English (4-8) 
100 M1,/M1, 4 — | 10 | English (3-7) 
Horizontal chord (C) 6 a 0 8 --- 
Subtense to chord (S) | 8 | 6 | Maori (4-2) 
100 S/C 8 | 6 | 
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Bilateral comparisons of three constants have been made in Tables III and IV 
on the assumption that all races tend to show the same asymmetry in type or 
variability. The validity of this assumption may be examined. In the case of any 
particular one of the constants, it is clear that in the vast majority of cases the 
bilateral difference for a series A and the corresponding difference for another 
series B will be found to differ insignificantly from one another. By selecting 
extreme values, however, some significant differences of differences might be 
found. An examination of Table IV suggests that the clearest evidence of a 
racial difference in asymmetry, if such exist, is most likely to be shown in the 
case of the means for the horizontal arc (M1,). At one extreme the Hindu series 
has a mean for the left side which is 0-9 greater than that for the right, and the 
probable error of this difference is found to be 0-120 on the assumption that the 
bilateral correlation is the same as that for the long Egyptian series. At the 
other extreme the Javanese series has a mean for the right side which is 0-5 
greater than that for the left and the probable error of this difference is found 
to be 0-144, on the same assumption. The difference of the differences for the 
two series is 1-4, and this is 7-5 times its probable error (0-187). This appears to 
afford clear evidence of racial distinction, but it is somewhat uncertain owing to 
the fact that the Egyptian bilateral correlation may differ appreciably from the 
unknown Hindu and Javanese values. And it must also be remembered that the 
case considered is an extreme value in a series of differences, so that a higher 
ratio of the constant to its probable error must be taken to indicate significance 
than would be the case if a single difference were being considered by itself. 
It will be safest to conclude that racial differences in asymmetry are certainly 
very small, and more abundant material would be needed in order to demonstrate 


beyond question that any races are differentiated in this way in the case of 
characters of the malar bone. 


5. The value of the measurements for the purpose of racial classification. The 
measurements were taken with the primary object of discovering whether their 
averages for different series provide suggestive arrangements which might aid 
attempts to determine the racial relationships of the populations represented. 
The number of series measured is not large, but it should be sufficient to show 
which of the characters of the malar bone are likely to be most useful for the 
purpose in view. The means are given in Table I and the arrangements provided 
by three different pairs of the characters are shown in Figs. 2-4. In considering 
these figures it is necessary to appreciate in a general way the differences for 
each variate which may be taken to indicate clear differentiation. 

Fig. 2 shows the inter-racial correlation of the horizontal and vertical arcs, 
the points being determined by the male means for the left side. In the case of 
both of these characters most of the differences between the means are large 
enough to indicate statistical significance. For the Punjabi and Hindu means in 
the left-hand bottom corner of the diagram, for example, the difference in the 
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case of the horizontal arc is 3-3 times its probable error, and that for the vertical 
are 3-8 times its probable error. Both measurements are capable of making 
many clear distinctions between the racial types, and there is a suggestion that 
ditferent members of the same family of races show small differences, while most 
of these families are distinguished from one another by occupyiig different areas 
of the bi-variate distribution. More abundant material would obviously be 
required to substantiate these points. The inter-racial distribution of either arc 
considered singly appears to be fairly continuous if the Eskimo series is omitted.* 
This has means widely removed and differing with marked significance from 
those for all the other series, and the large size of its malar bones appears to be a 
salient characteristic of the specialized Eskimo type. It should be noted that the 
measurements are not available for any American-Indian series. The two arcs 
appear to be highly correlated inter-racially, and the point for the Eskimo series 
is clearly the one which would be farthest removed from the regression straight 
line. Owing to the high correlation, we may anticipate that the index expressing 
the vertical as a percentage of the horizontal arc will be of little interest. 

Fig. 3 shows the distribution of the series given by their means for the 
horizontal chord, and the maximum suvtense to this chord. It is again found 
that most of the differences between the points are statistically significant in the 
case of each variate, and the Eskimo means again differ from all the others with 
marked significance. Some of the families of races appear to be separated, as 
before, and the differences between different races belonging to the same family 
are apparently all small. The inter-racial correlation between the chord and the 
subtense seems to be sensible but appreciably lower than that between the two 
arcs. 

Fig. 4 shows the arrangement provided by the two indices. As was antici- 
pated, the index 100 M1,/M1, fails to arrange the series in any suggestive order. 
It is distinguished from all the absolute measurements by the fact that it shows 
far more insignificant than significant differences in the comparison of all 
possible pairs of the means, though the Eskimo mean still diverges widely from 
all others available. The index derived from the horizontal chord and the 
maximum subtense to it distinguishes the types far more clearly: in this case 
the Eskimo mean is still extreme, but it differs insignificantly from the Chinese 
which is nearest to it. This measurement of curvature appears to distinguish 
the different families of races rather more effectively than any one of the four 
absolute measurements does. 

A provisional estimate of the value of the malar-bone measurements for 
anthropological purposes can now be given. The material available for them is 
ample enough to show that racial types of cranium have malar bones which 
differ very appreciably in both size and shape. Of the four absolute measure- 


* The mean of 63-3 for the left horizontal are given by Dr von Bonin for 50 New Britain skulls 
is close to the Maori and Kanaka means. 
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ments and two indices considered, the index obtained by expressing the vertical 
as a percentage of the horizontal arc is quite the most constant. This appears 
to be of little value for purposes of racial classification. The other five 
characters seem to differentiate the racial types as effectively as most of the 
usual cranial measurements, and more effectively than several of these. The 
material available suggests that they are characters which tend to be constant 
for races belonging to the same family of races, but which provide suggestive 
orders when the different families are compared with one another. They are 
thus of the same nature as skin colour, the nasal index, measures of prognathism 
and of the “‘flatness”’ of the facial skeleton, and indices derived from the lengths 
of the limb bones. Stature, the cephalic index and most calvarial measurements 
differ from these as they fail to make clear distinctions between the different 
families of races. Among races of the Old World the European and Indian, 
with their smaller and flatter malar bones, are at one extreme of the range: 
Oriental races have the largest and most curved bones, and negro and ancient 
Egyptian occupy intermediate positions. This arrangement accords in a general 
way with those provided by the indices measuring the “flatness” of the facial 
skeleton as a whole. In both cases, too, the Eskimo type has been found to 
secupy markedly aberrant positions. Judging from the short series measured, 
its malar bones are far larger than those of European, African, Asiatic and 
Oceanic types; they also show a greater degree of curvature, but the Eskimo 
type is most clearly distinguished by the fact that the heights of its malar bones 
(measured by the vertical arcs) are most peculiarly small compared with their 
antero-posterior lengths (measured by the horizontal arcs). This is of particular 
interest since the index measuring this ratio is the character least capable of 
distinguishing the other races from one another. The Eskimo type is detached, 
as it were, from the continuous system to which the others belong. It is generality 
recognized to be peculiarly specialized, but none of its characters are known to 
be more characteristic than these malar-bone measurements and the indices of 
facial ‘‘flatness”’. More of these data for Eskimo, Eastern Asiatic and American- 
Indian cranial series would probably throw as much light on the question of the 
affinities of the Eskimo population as any other new material. 

While they are incapable of providing by themselves any reliable classifica- 
tion of the races of modern man, there is every promise that the malar-bone 
measurements dealt with in this paper will prove to be a valuable aid for the 
purpose when considered in conjunction with other characters. Hence it is 
suggested that they might be included with advantage in the routine descriptions 
of racial series of crania. 
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THE SAMPLING DISTRIBUTION OF THE CRITERION zy, 
WHEN THE HYPOTHESIS TESTED IS NOT TRUE 


[EprrortaL Nors. The criterion j,, is appropriate to test the statistical 
hypothesis that the standard deviations of a character x are the same in a number, 
say k, of different normal populations. In the form L, = A¥ (where N is the 
number of observations in the pooled samples), the criterion becomes the ratio of 
the weighted geometric to the weighted arithmetic mean of the k sample variances. 
For the special case where the samples are of equal size, tables of 5 % and 1% 
probability levels have been determined by an approximate method by Mr 
P. P. N. Nayer.* 

It is important however not only to have available these significance levels 
and so to control the risk of rejecting the hypothesis tested when it is true, but 
also to have some means of appreciating the chance that the test will detect 
real differences in population stand:~* deviations when they exist. By this 
means it becomes possible to compare the efficiency of this and alternative tests. 
Over a year ago Dr 8. 8S. Wilks promptly responded to a request of mine for help 
in this matter by providing the sampling moments of Ly}, in the general case 
where the population standard deviations are unequal. Since then Miss C. M. 
Thompson has compared his suggested Type IIT curves, having these moments, 
with a series of values of Lj! calculated from experimental sampling data. The 
correspondence between experiment and the Wilks’s curves is excellent. Some 
further research into the matter is in progress. E.S.P.] 


I. NOTE ON THE GENERAL SAMPLING MOMENTS OF Aj, 
By 8. 8. WILKS 


Neyman & Pearson} have considered in some detail the problem of de- 
riving a criterion A,, for testing the hypothesis H, that k samples have come from 
populations with equal variances but with means having any values whatever. 
They have discussed the sampling theory of A;;, when H, is true. Here we shall 
be concerned with the more general case in which H, is not true. In order to 


* Statistical Research Memoirs (Dept. of Statistics, University College, London), 1 (1936), p. 51. 
The substantial accuracy of the approximetion involved has since been verified by Mr U. 8. Nair 
whose work on the subject will be published shortly. 

+ J. Neyman & E. 8. Pearson, “‘On the Problem of k Samples,” Bull. int. Acad. Cracovie, 
Sér. A, 1931, pp. 460-81. 
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indicate more clearly the point of departure for this note we shall briefly describe 
what has been done. 


Let the ith sample (¢=1, 2, ..., k) of n, individuals be denoted by >, and 
suppose &, has been drawn at random from a normal population with mean a, 
and variance of. Let %, and s? be the mean and squared standard deviation of 
x,.* In the present problem %, and s?(¢= 1, 2, ...,&) are the only functions of the 
individual observations with which we shall be concerned. The probability that 
%, and s? will fall in the infinitesimal ranges %,+ $dz,, s? + (t= 1, 2, ..., k) will 
be proportional to 


We may regard the k samples &,, X., ..., &;, described by the k pairs of quantities 
%,, 8? as having been drawn from the grand population (1). 
Now H, is the hypothesis that the =, are from populations with 


3... gt — = at 


The set Q of admissible populations consists of all populations (1) which could be 
obtained by taking all possible values of of and a,. The set w of populations is 
that subset of 2 for which the o’s are equal. The criterion A, is the ratio 
C(@max) 
C(Q 
where C(w,,,,x) is the maximum of C taken over all populations in w and C(Q,,,.) 


is the maximum of C taken over all populations in Q. Expressed in terms of 
the s’s 


> 
max ) 


fed 

4 2 

Ay, = (2) 
k 

where = > 783, N ... +My. 
t=1 


Neyman & Pearson} have considered the sampling properties of Aj, under 
the assumption that H, is true, that is, that the samples are drawn from a member 
of w. We shall consider the sampling properties of A, under the assumption 
that the samples are from any member of Q, in which the o’s are not necessarily 
the same. 

Since we are using A,, as an instrument for ordering the data embodied in 
X,, Lp, .-., Ly, With respect to the tenability of H, it is clearly immaterial whether 
Ay, or any single-valued function of Ay, be used. From a theoretical point of 


* Here s? is defined by = —,)*. 
} Loc. cit. pp. 467-73. 
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view the problem can be somewhat simplified by using AW?’ Since Ay, is a 
function of s}, 3, ..., 83, the gth moment of Az?" will be defined by the expression 


m—1 
2 
(3) (s3 
207 
where df, = (4) 


Now it is clear from (2) that yj will be the gth derivative with respect to @ 
of the following expression at # = 0, 
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For g = 1, 2 we find with little difficulty 
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In the important practical case in which n, = n,=... = nm, =m, (6) and (7) 
reduce to 
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2 
2 
k?(o? of ... of )k 
When the hypothesis H, is true, that is, when of = o3 = ... = oj, it can be easily 
verified that 
‘ 2 k 1 (m)\N 2 N 


as obtained by Neyman & Pearson. jj will exist for all values of g for which the 
arguments of the gamma functions are positive. 

The higher moments of Az?" become more and more complicated so that 
there is little hope of finding a workable form of the exact distribution function 
of Az?". Therefore, since 4j7,”" has the range | to 00, it appears that its dis- 
tribution could be reasonably approximated by fitting a Type III curve by 
means of the first two moments. Let the form of the curve be 


(11) 


Equating the first two moments of (11) about the origin to 4; and ps and solving 
for a and b, we find 


If. AN INVESTIGATION INTO THE ADEQUACY OF DR WILKS’S CURVES 


By CATHERINE M. THOMPSON 


The basic data used consisted of 500 samples of (i) n, = 5, (ii) nm, = 10 and 
(iii) nm, = 15 from a common normal population, obtained with the help of 
Tippett’s Random Numbers. The values of, say, v = X(a—%)?/n had already 
been calculated for each of the 1500 samples for another purpose. By multiplying 
the values of v by appropriate factors it was possible to obtain 500 sets of values 
s?, s3 and s3 from populations having unequal variances o7, of and o3, respectively. 
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For example, if v,, v, and v, denote the basic sample variances, if the common 
population standard deviation is unity and if N = n,+n,+7, = 30, then 
3 


nme 
2 N 23 , i 2 3 
s(n 33) + FOR + 
N t=1 


The result, depends only on-the relative magnitude of the three values of o°. 
Six different cases were taken and for each the 500 values of Lj! were computed; 
since the same basic values of v,, v, and v, were used, the six resulting frequency 
distributions of Lj! are not completely independent, but their relationship is of 
no simple character. The cases taken were as follows: 


Case 1 2 1 2 
1 2 2 
2 1 1 
1 2 1 
4 2 1 
8 10 2 1 


Since the three values of are unequal, the first four cases correspond to different 
situations. The resulting histograms for the six distributions of Lj! are shown in 
Figs. 1 and 2. 

For each case the following steps were then performed: 


(i) The appropriate moments and and hence = — were calcu- 
lated from Wilks’s equations (6) and (7). 


(ii) These moment values were inserted into his equation (12) to give the 
constants a and b. 


(iii) These constants were inserted in turn into his Type III equation (11), 
the curves drawn and frequencies calculated to enable a x? test to be applied. 


A summary of results is shown in Table I. The column headed P {x?> x3} 
shows the result of applying the x? test for goodness of fit, x2 being the observed 
value. The agreement between the theoretical curves and the experimental 
sampling results is very close, and suggests that the method of approximation to 
the unknown true distribution of Lz! is most satisfactory for practical purposes. 

Of course the investigation only covers the case k = 3 and n, = 5, ny = 10, 
nm, = 15, but at any rate for larger samples one would not expect worse agree- 
ment. 

It is also of interest to investigate, in the six cases, what would be the chance 
of detecting from the samples that o,, c, and o; were not all equal. Suppose that 
we used a rule of rejecting the hypothesis, H), that o, = o, = o, whenever L, 
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EXPERIMENTAL DISTRIBUTIONS OF 500 VALUES OF 
COMPARED WITH WILKS’ CURVES. 

(n,= 5, NM, n, = 45). 

0 


a CASE 1. 0,'=2, 0, 


LIMIT. 


10 11 12 13 14 15 16 17 18 19 20 
SCALE OF L; 


HISTOGRAMS 


CASE 2. 0; +2, 2. 


3 


,5%, LIMIT. 


H 
10 14 12 13 1-4 15 16 1:7 18 19 20 
10) SCALE OF L, 


SCALE OF FREQUENCY FOR THE 


CASE 3. 0,°=2, 0,°=1, 1. 


,5% LIMIT. 


1-0 11 12 13 14 45 16 17 18 19 20 |: 
SCALE OF 
Fig. 1. N.B. The 5 % limit is appropriate for the case where the hypothesis tested is true. 
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EXPERIMENTAL DISTRIBUTIONS OF 500 VALUES OF L- 
COMPARED WITH WILKS’ CURVES. 


woh | (n,-5, 10, n, +15) 


80 
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CASE +. 0, +1, 
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CASE 5. 4, 2, +1. 


15% LIMIT 
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15% LIMIT. 
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CASE 6, 10, 0, +2, 
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Fig. 2. N.B. The 5% limit is appropriate for the case where the hypothesis tested is true. 
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TABLE I 
Summary of resulis of calculations 
Values of 
Case | Values of | Moments | Theory | Observed | a b 

ni 1-1462 1-1426 

1 2:1:2 1-0891 | 7-4493 | -615 -136 
0-0196 0-0168 
11356 1-1325 

2 1:2:2 -120 
Me 0-0170 0-0160 
al 1-1152 1-1082 

3 2:1:1 —| 0-8197 | 7-1144 | -525 -093 
He 0-0162 0-0115 
ni 1-1553 11495 

4 1:2:1 0-9747 | 6-2760 | -853 “158 
0-0247 0-0230 
ni 1-2195 1-2071 

Me 0-0438 0-0330 
pi 1-5663 1-5283 

6 1-2543 | 22148 | -389 -646 
Me 0-2557 0-1968 


falls beyond the 5 % level, say LZ, (0-05). It is first necessary to calculate this 
level for the particular case considered. To do this, Neyman & Pearson’s Type I 


* Using 5 % significance level. 


approximation* to the distribution of L, if H, is true, namely 


p(L,)= 


(m, + mg) Lm 


(m,) (mg) 


may be used. The true sampling moments of L, about zero are in this case 


4N? 


0-9225 2768, 


3 


4. = 0-8561 9647. 


* See reference on p. 124 above and also P. P. N. Nayer, loc. cit. 
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The values for m, and m, are then chosen so that the first t+ moments of 
the distribution (2) have the values (3), that is to say, 


m, = (44 Ha) = 11-90710, m, = (41 Ha) = 0-99994. 
— (#4)? — (Hy)? 
In this case the value of m, is so close to unity that we may use the approxi- 
mation 
and hence for the 5 % limit 


'L,(0-05) 
0-05 = | p(L,) dL, = {L, (0-05) 
0 


and L, (0-05) = 0-778. 


This limit, or rather the limit {Z, (0-05)}-* = 1-286, has been drawn in each 
of the diagrams. Neyman & Pearson* have defined the power of a test with 
regard to an alternative hypothesis H, as the probability that it will reject the 
hypothesis tested, Hj, when H, is true. Thus if a 5 % level of significance is used, 
the power of the L, test in the six cases illustrated is given by the proportionate 
area under Wilks’s curves lying to the right of the critical levels drawn at 

7; = 1-286. The six values of this probability, obtained by quadrature, are given 
in Table I. 

It is interesting to note how, owing to the three samples being of unequal size, 
the power is different in each of the first four cases. Thus when the samples of 
10 and 15 come from populations with the same variance, o3 = o2, and the 
smallest sample of 5 from a population with twice the variance (case 3), the 
test is least likely to detect the difference. It is somewhat more likely to do so 
when of = $03 = $03 (case 2). We also see how difficult it is to detect differences 
in population variances when only small samples are available; even in case 6, 
where of : 3 : of = 10: 2:1, the odds are only 2 to 1 in favour of our being able 
to discover the difference, using the L, test. 


* Neyman & Pearson, “Contribution to the Theory of testing Statistical Hypotheses,” 
Statistical Research Memoirs, 1 (1936), pp. 1-37. 
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THE EXACT VALUE OF THE MOMENTS OF THE DISTRI- 
BUTION OF x?, USED AS A TEST OF GOODNESS OF FIT, 
WHEN EXPECTATIONS ARE SMALL 


By J. B. 8S. HALDANE, F.R:S. 


1. INTRODUCTION TO GENERAL METHOD 


In genetical practice we are constantly presented with large numbers of small 
samples from populations consisting of several well-defined classes. For example 
in the mouse we can readily obtain hundreds of litters containing anything from 
one up to about twelve members. Their totals may agree satisfactorily with 
expectation on a Mendelian basis, for example } coloured, } white, or = grey, 3; 
black, } white. But we desire to know whether the individual! litters can be re- 
garded as random samples from such a population. In addition the problem of 
homogeneity may arise. That is to say the population as a whole may not conform 
to any particular expectation. But we may desire to know whether the litters can 
be regarded as random samples of the population given by the totals. 

It has long been known that when the numbers expected in any observation 
are small, the distribution of y? departs from that given by Pearson (1900). The 
mean appears sometimes. but not always, to be equal to the number of degrees of 
freedom. But the variance is no longer exactly equal to twice that number. 
Exact expressions for it in certain cases have been given by Pearson (1932) and 
Cochran (1936). These are based on an ingenious application of the theory of 
multiple contingency by Pearson. 

It will be shown in this paper that the first few moments can often be calculated 
by entirely elementary methods involving nothing more advanced than the 
multinomial theorem. In an accompanying paper (Griineberg and Haldane, 
1937) they will be applied to actual data on mice. 

We first study the distribution of yx? in a n-fold table with n—1 degrees of 
freedom, then in a (mx n)-fold table with m(n—1) degrees of freedom. For 
genetical work we are particularly interested in the (n x 2)-fold table with n 
degrees of freedom. As a limiting case of the 2-fold table with 1 degree of freedom 
we derive the moments of the variance of samples from a Poisson series, and 
thence the distribution of x? in a n-fold table with n degrees of freedom. The 
important case of the (m x n)-fold table with (m—1) (n—1) degrees of freedom 
remains to be investigated. 

Consider a sample of s individuals falling into n classes. Let the expected and 
observed numbers in these classes be: 


Expected 18, 298, ...5 
Observed Gg, ...,@;, 
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n n 

where > p;=1, } a;=s. It is assumed that we are sampling from an infinite 
i=1 i=1 

population, or that if it is finite, any individual observed is replaced before the 


! 
next individual is chosen at random. We shall use the notation s,, = er , and 
E |x] to denote the expected value of x, or Z. 
The recs of obtaining in a sample exactly a@,, ay, a3, etc., members of the 
gi 
n Classes is s! Il 
i=1%: 
Hence the expected value of 
a,! a,! a,! 
(a;—a,)! (a,—«,)! (a), — a)! 


is the sum of this quantity multiplied MR 


14; 

summation being over all permissible sets of values of a, , a, and a, , i.e. for all zero 
or positive integral values satisfying the condition > (a;)=s.* Making use of the 
multinomial theorem it is seen that this sum is pines Pr? Sa;+ag+an): But we can 
readily express any power of , or any multiple of powers such asa asa sum 


of expressions of the form ——*— I and of their products. Hence we can express 
a;— 


the expected value of any power or product of powers as a sum of terms of the 
form p’p§ p', 8,+41- Lhe following expressions for the expected values of powers and 
their products will be required in the analysis which follows: 


[a7] = pis, + p;8, 

E = pis, + 6piss + + p;8, 

E = pis, + + 65pis, + 90pis, + + p,8, 

E = p$s, + + 266p$s, + 1050p%s, + 1701 pts, + 966p3s, + 127p3s, + p;s. 
In general E[a?']=,U, where ,U, 
E = (Pip; + PiP}) 83+ PiPjS25 
= pips, + (pip; + Spi 85 + (Gpip; + 84+ + 83 


E = pip} + (PiPi PE + PE + PEP; Pr) 85 

+ PiPj PE) 84+ PiPj 
* The value of a;!/(a;—«%,;)!=a,;(a;—1) ... (a;—a,+1) is of course zero for a;< «;. 


+ Mr C. Eisenhart has kindly pointed out to me that these coefficients are differences of 
powers of zero divided by appropriate factorials. 
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E = (pip; + + + + (65p}p, + 
+ (90p}p; + + (31p7p; + 83+ Di 
= pip} ss + + 6pip}) 8, + + 36p}p} + 8 
+ (pip; + 42pipj + 42pip; + 85 + (6pip; + + 6p, p}) 8, 
+ (7pip; + 83+ PiP;82, 
E = pi + (PEP; + PE + OP} PE) 87 
+ + pip; + PE + 86 
+ PEP; PE) 84+ Pi Pj 
= PEPI 83 + UP + 
+ Pr85 + PiPj 
In the special case where p, = p.= p,=... =n~1 we have such expressions as 
E =n s, + 8n-" 8, + 20n-*s, + 21n-s, + +3. 
In what follows, = denotes summation over all values of 7, 2X summation over 


all pairs of unequal values of i and j. The following notation is used for sums of 
reciprocals: R, = Xp; 1, R,= Xp; *, R,= *. 


Hence 
=n—1. 
[84 + + 83 + + 89 + 2VE1) +8 
=s—*[s,+ (2n+ 4) 83+ 6n)s8.+ R,s]— 2 (s.+ns) +8? 
=n? 


Hence 


This agrees with Pearson’s (1932) result. The calculation of the higher 
moments is somewhat tedious. It can be greatly simplified by the following 
device. The moments are calculated for the special case when all p,’s are equal. 
The terms in the general case which involve sums of negative powers of the p,’s 
are then calculated separately, and an adjustment made to the previous formulae, 
since when p;=n-, 
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2. CASE OF AN 2-FOLD CLASSIFICATION (n— 1 DEGREES OF FREEDOM) 


For a sample of s divided into n classes of which the expectations are equal, we 


find: 
E [Xa?] = (s, + ns) =n [s? + (n— 1) 8], 
E [s, + (2n + 4) 85 + (n* + 6n) 8. + 73] 
=n-*[st+ 2 (n—1) 8? + 
E [Zaz = n- [8, + (3n + 12) 8, + (3n? + 30n + 32) 8, + (n? + 21n? + 68n) 8, 
+ + 28n?) + ns] 
=n-3[s* +3 (n—1)8°+3 (n?—1) 84+ (n? + 3) 8° 
— 2 (n* + 12n— 13) s?—4(n?—7n+ 6) 8], 
E = n- + (4n + 24) 8, + (6n? + 84n + 176) 8, + (4n? + 102n? -544n+ 400) 
+ (n* + 48n3 + 516n? + 1136n) 8, + (6n4* + 152n3 + 808n?) 8, + + 120n?) s, + n4s] 
=n-*[s§ + 4 (n—1)87 + 6 (n?— 1) 4 + 3n?—4n) 8° 
+ (n* + 8n* + 6n? — 104n + 89) — 17n? — 45n + 61) 8° 
— 4 + 51n?—314n + 261) s?— 8 (n3— 31n? + 120n — 90) 8]. 


=n? —1—2(n—1)s—, 
= (n+ 3) (n+ 1) (n—1) —2(n—1) (n+ 13) 8-1-4 (n—1) (n—6) 8, 
= (n+ 5) (n+ 3) (n+ 1)(n—1)+4(n—1) (n?— 12n— 85) 
— 4 (n—1) (2n* + 53n — 261) 8 (n— 1) (n? —30n + 90) 8-3. 
The first four moments and cumulants are: 

fg =Ky=2(n—1)—2(n—1) 8-1 = 28-1 (n— 1) (s—1), 

fg = Kg = 8 (n—6) 8 
= 4s~* (n— 1) (s—1)(n+2s—6), 

= 12 (n—1)(n+3) + 24 (n—1) (3n—19) 8-1 +4 (n—1) (2n?—81n+ 285) 8 
—8(n—1) (n?—30n+ 90) s-, 

Kk, = 48 (n— 1) + 96 (n— 1) (n—5) 8-1 + 8 (n—1) (n? -—42n+ 144) 82 


—8 (n—1) (n?—30n+ 90) 
= (n — 1) (s— 1) [n? + 6 (28 — 5) n+ 6 (s?-- 98 + 15)], 


(1) 
where «4= p44 — 3y3. 
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_ 2(n+28—6)2 


in the general case where not all the p,’s are equal, we find: 


=8,+(2n+ 4)s8,+(n?+ 6n)s,.+ 
i 


2)s, 


E Bale + (3n + 12) 8; + (3n? + 30n + 32) s8,+ (3R, +n? + + 68n) 8, 
+ (3n + 28) Rys 
+[(3n + 19) R, —3n3 — 21n? — 24n + 26] 8? 
+ [Ry—(3n + 22) R, + + 18n? + 28n — 24], 
E [= = 8, + (4n+24)s8,+(6n?+84n + 176)8, + (6R, + + 96n*? + 544n + 400) 8, 
+[(12n + 36) R, + n* + 36n3 + 380n? + 1136n] 8, 
+[4R, + (6n? + 148n + 808) R,] 8, + [(4n + 120) R, + 8, + 
=s§+ 4 (n—1)s7+ 6 (n?—1)s*+ (6R, + 4n3 + 6n?— 16n) 8° 
+ [4(3n+ 19) R, 4n3 — 70n? — 104n + 89] s* 
+ [4R, + (6n? + 76n + 202) R, — 6n4* — 76n3 — 270n? — 180n + 244] s* 
+[(4n + 108) R, + 3R?—(18n? + 312n + 1228) R, + 11n*+ 196n8 
+ 1024n? + 1256n — 1044] s?+[R,—(4n+ 112) R,-3R? 
+ (12n? + 224n + 944) R, — 6n4— 120n8 — 696n? — 960n + 720] 8. 
But = 


x#=(n+ 1) 


= (n+ 3) (n+ 1) (n—1)+[(3n+ 19) R, —(3n3 + 21n? + 24n — 26)] 
+[R,— (3n + 22) R, + + 18n? + 28n — 24] 
x®=(n+5) (n+ 3) (n+1) (n—1) +2 [(3n® + 44n + 145) Ry — (32n4 + 4234 171n2 
+ 146n —170)]s—)+ [4 (n+ 27) Ry + 3R2—2 (9n2 + 156n + 614) R, 
+ 11n* + 196n3 + 1024n? + 1256n — 1044] + [R, — 4 (n + 28) R,—3R? 
+4 + 56n + 236) R, — 6 (n*+ 20n3 + 116n? + 160n — 120)]s-. 


| 
Wee 
‘ 
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Therefore: 


=K,=n-1, ] 
2(n—1)+[R,—(n? + 2n—2)] 
Hg = Kg =8(n—1)+2[11R, —(9n2+ 18n—16)] 
+[R, — (3n + 22) R, + 2 (n? + 9n? + 14n 12)] 8, 
12 (n—1)(n+3)4+ 12[(n +31) R, — + 25n? + 44n — 38)] 
+ [112R, + 3R}—2(3n2+ 1182 + 658) R, + 3 (n4+ 44n + 328n? + 488n—380)] 
+[R,—4 (n+ 28) R, —3Ri+ 4 (3n? + 56n + 236) R, 
— 6 (n* + 20n3 + 116n? + 160n — 120] s-°, 
K,= 48 (n—1)+ 96 (42, — 3n?—6n + 5) 
+8[14R, —2(14n + 83) R, +3 + 41n2 + 62n—48)] 
+ [Ry —4 (n+ 28) R,—3R? + 4(3n? + 56n + 236) R, 
— 6 (n* + 20n? + 116n® + 160n — 
(2) 
It will be seen that when any of the expected frequencies p; is very small, the 


moments may be considerably larger than those of the classical x?, to which 
they approximate when the number s in the sample is large. 


3. SPECIAL CASE OF TWO CLASSES (1 DEGREE OF FREEDOM) 
When there are onlv two expected classes, with frequencies p and q, i.e. n = 2, 
we have, if k= =. 
Pq 
= k,=1, 
8-1, 
fig = Kg = 8+ 2 (k? 30k + 120) 
60 + 12 (33k — 158) + (115k? — 2036k + 6828) 
+ (k8 — 126k? + 1680k — 5040) s-, 
48 + 96 (4k — 19) + 16(7k? — 125k + 420) s-2 
+ (k3 — 126k? + 1680k — 5040) } 


These expressions can of course be calculated independently, and furnish a 
useful check on the equations (2). When s tends to infinity, provided neither p 
nor q tends to zero, we have x,=2, x,=8, K,=48, x, =2"-1(n—1)!, the values 
appropriate to Pearson’s x”. If, however, sp remains equal to g while s tends to 
infinity we have: m=K,=1, 
= 60 + 3969-1 + 115g + 9°, 
48 + 3849-1 + ++- 
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These are the moments and cumulants of x?=(a—g)*/g fora sample from a 
Poisson series when the expected value is g and the observed value is a. If V be 


the variance, V = (a —g)*=g so and x,g” are the moments and cumulants 
of the variance of such a sample. 


4. CASE OF (m x ”)-FOLD CLASSIFICATION (m(n—1) DEGREES OF FREEDOM) 


We now consider the values of x? in 2 dimensional tables. Consider a (m x n)- 
fold table where m samples of s,, 82, 83, ..-, 8,5 ---» 8, members have been drawn 
independently from an infinite population in which the frequencies of n classes 
Ale Po, Py, Pn» aNd where, as above, 


n n n 
i=1 i=1 i=1 
There are clearly m (n— 1) degrees of freedom. Summing the cumulants, given in 
equation (2), appropriate to the x? calculated from each sample, we have therefore: 
=m (n—1), 
= 2m (n—1)+[ Ry 2n—2)] 873, 
r=1 


= = 8m (n—1)+ 2[11R, — (9n? + 18n—16)] 
r=1 
m 
+[R,— (3n + 22) R, + 2 (n? + Gn? + 14n—12)] 87°, 
r=1 


m 
K,= 48m (n— 1) +96 [4R, — (3n?+ 6n—5)] 


+8[14R, — 2 (14n + 83) R, +3 (5n3 + 41n? + 62n—48)] sy? 
r=1 
+[R,—4(n + 28) R, — 3R2+ 4(3n2 + 56n + 236) R, 
— 6 (n* + 20n? + 116n? + 160n — 120)] ¥ 


r=1 


Hy = Ky t 3x3. 


5. CASE OF (n x 2)-FOLD CLASSIFICATION (m DEGREES OF FREEDOM) 


These are the most general formulae arrived at in this paper. Special cases 
analogous to equations (1) and (3) obviously arise. Only the latter will be given as 
it is used by Griineberg and Haldane (1937). For x samples each consisting of s 


members, and each divided into two classes, whose expected values are ps and qs, 
we have 


x= , degrees of freedom =», 
r=1 


| 
| 
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where a, is observed frequency in the first class of rth sample. Further, writing 
k=(pq)-" we have for the moments and cumulants of x?: 


(28+ k—6), 

= kg = + (22k — 112) 8-+k2— 30k + 120], 
pg = ns [12 (n+ 4) 8° + 12 {(kK—6)n+8 (4k —19)} 5? 


+ {3n (k—6)®+ 16 (7k® — 125k + 420)} + — 126k? + 1680k — 5040], 


k,=ns- [488° + 96 (4k — 19) + 16 (7k? — 125k + 420) 
— 126k? + 1680k — 5040]. 


ns (2s + k—6)8 


It will be seen that when k is large compared with s, that is to say when one of 
the expectations is a small fraction, 8, approximates to k/ns, whereas its value 
when s tends to infinity is 8/n. But for moderate values of s the skewness may be 


considerably less than in the classical case. 


Two numerical values of k are important in genetics. If p=q=4, k=4, and: 


= Kg = 2n (8 — 
= 8n (8-1) (s—2)s-2, 
16n (s— 1) 15¢ +17) 8-3, | 


_ 8(s—2)? 
1)" 
Thus for example when n=50, if s=4, 8,=-053, whereas when s is infinite 
B,=:16. 
When p=}, q=#, k=48, we have: 
(88-1) 
35 


= 16n (818° + 378s? -- 1284s + 823) 
278? 


8 (9s? + 6s — 13)? 


For example if n = 50, s=4, B, =-24. 


3ns(3s—1)> 


The values of s will generally vary from one sample to another. In this case the 
values of the cumulants are the sums of the values found for the different sample 


sizes. 


: 
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; 
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6. CASE OF n-FOLD CLASSIFICATION (” DEGREES OF FREEDOM) 


A limiting case of the (n x 2)-fold classification arises when the values of s, 
tend to infinity, but the expectations, g,=ps,, in say the first category remain 
finite. If a, is the observed frequency in this category, it may be regarded as 
resulting from a single sample from a Poisson series with expected value g,. Then 


x= (4,— 
r=1 

and using equations (4) we have 

=n, 

Pg 

=Kg=8n + 22297) + 

48n + 38449714 
If we call s the size of the whole sample, and let g,=sp,, while R,= Zp; , 
R,=Xp;*, R,==Xp;*, we can write, in full analogy with equations (2): 


\ 
= Kg = 8n + 228-1R, +8 R,, ..-(9) 


= 12n (n+ 4)+ 12 (n+ 32) 112R,)+s8R,, 
K,= 48n + 3848-1 R, + + 


The great simplicity of these expressions as compared with (2) is noteworthy. 
The extra terms in (2) represent diminutions in the moments due to the loss of 
one degree of freedom. 


7. A WEIGHTING CORRECTION 


If we have a number of samples of different sizes s, their variances will differ. 
Now when s is large the probability that each sample will make a given contribu- 
tion to x* is equal. If we wish to reinstate this condition as far as possible, we must 
arrange for a proper weighting of the contributions made from the various samples. 

If we have m samples, in each of which there are » classes in expected numbers 
8P,, 8P,, etc., s being the number in the sample, taking the mean and variance 
from equation (2), we put for each sample: 


st[x?—(n—1)] 
~ [2(n—1)8—(n?+ 2n—2)+ 


Then in each sample the mean of { is 0, and its variance 1. Hence for the m 
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samples the variance of the sum of the @’s is 0 and its variance n. In the case of a 
(n x 2)-fold table with n degrees of freedom 


This weighting correction is hardly worth making in the Mendelian cases 
where p=}, k=4, and p=}, k=48. Here the weighting factors vary from -7071 
for s=oo to 1 for s=2 in the case when k=4, and from -7071 when s=oo to 
-866 when s=1 in the case when k=148. But when k is larger the variation is 
considerable. Thus, in the case of a selfed autotetraploid p = ;{;, and the weighting 
factor falls from -7071 when s is infinite to -1740 when s=1. 


8. Discussion 


The results obtained for the moments of x? in a (n x 2)-fold table when p is 
fixed do not agree with those given by Cochran (1936), who finds a mean value 


(n — 1)?(k—6) 
n—1, and a variance 2 (n — 1)+ These results only differ from those 


1 
here given by a factor of the order 1 — =e and are therefore satisfactory when 7 is 


large. However, when p is known and does not have to be estimated from the 
data there are clearly n degrees of freedom, and not n—1; hence my own results 
would appear to be slightly more accurate than Cochran’s. I have, however, no 
reason to doubt the accuracy of Cochran’s results when p is estimated from the 
totals. It is noteworthy that while in the cases considered here, where x? is used 
as a test of goodness of fit, its mean is always exactly equal to the number of 
degrees of freedom, this is no longer so when it is used as a test of homogeneity. 
Thus Cochran finds for a ( x 2)-fold table with n — 1 degrees of freedom, a mean y? 
of n—1 — a when all samples contain s members. 
ns ns 

It follows from the results here given that the distribution of x? for large values 
of n generally approximates fairly closely to normality. It would of course be 
possible to find a function of x? whose distribution is much more nearly normal. 
Thus Wilson and Hilferty (1931) found that, when s is large, 


is very nearly normally distributed with mean zero and variance unity. It may 
also be desirable, as Fisher (1922), Neyman and Pearson (1928) and Cochran 
(1936) point out, to use the logarithm of the likelihood of the sample, rather than 
x?, as a test of goodness of fit when expectations are small. It is, however, worth 
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pointing out that, in the estimation of the frequency of lethal genes in autosomes, 
a problem with which I hope to deal later, x? appears to furnish a simple and 
satisfactory estimate, and its distribution must therefore be known. 


9. SuMMARY 


Exact expressions have been found for the mean and the first four moments of 
x” in cases where it is used as a test of goodness of fit, that is to say in (m x n)-fold 
tables with m (n—1) degrees of freedom. The mean is always exactly m (n—1). 
The expressions for the higher moments are more complicated. Information has 
therefore been obtained which will make it possible to apply the x? test without 
restriction on the size of the samples on the numbers expected. The results do not 
apply where x? is used as a test of homogeneity, the expectations being deduced 
from observed totals. 
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TESTS OF GOODNESS OF FIT APPLIED TO RECORDS 
OF MENDELIAN SEGREGATION IN MICE 


By HANS GRUNEBERG and J. B. 8S. HALDANE 
Department of Genetics, University College, London 


Ir has long been known that in some, though not in all instances, the totals 
in cases of segregation involving a small number of genes were in satisfactory 
accord with the simple numerical ratios expected according to Mendel’s laws. 
Divergences could often be explained by selective mortality of zygotes between 
the time of fertilization and the time when the characters in question could be 
determined. 

But where the totals were in agreement with Mendelian expectation it was 
not clear that individual families might not show an unexpected number, either 
larger or smaller than that expected on sampling theory, of large deviations 
from the expected ratio. 

Where the families, and the expectations in all groups, were sufficiently large, 
it was possible to apply Pearson’s classical y? method (e.g. de Winton & Haldane, 
1933). Where samples were smaller this was no longer possible. However, Haldane 
(1937, pp. 133-43 above) has calculated the moments of the distribution of x? 
when expectations are small. 

If we are dealing with m samples from a population in which two classes occur 
with frequency p and 1 — », if s, be the size of the rth sample, then the principal 
parameters of the distribution of y? are given in Table I for the two Mendelian 
cases where p = } and }. These results follow if we put m = n, n = 2 in Haldane’s 
equation (5) (p. 139 above). 


TABLE I 
Distribution parameters of x? in Mendelian cases 
p=} p=} 
Mean n n 
Variance 2(n — Xs,-1) 
8(n + 2E8,-*)? 8(9n + 13Es,-*)? 
(n—Xs,-1)8 3(3n — Xs,—1)8 
4(3n— + — 4(81n+ 378Xs,-1 — 1284Es,-* + 
(n—%s,-1)? 3(3n — Xs,-1)* 
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In the case of our data it will be shown that » is often so large that B, (or y,) 
and y, are of the order of their own sampling errors or less. The distribution of 
x” can thus be treated as approximately normal. 

Our experimental data are as follows: 


(1) Records of 562 litters including 3707 mice, from back-crosses of mice 
heterozygous for the normal colour gene C and its allelomorph c® to recessives. 
These were obtained during an experiment (Griineberg, 1936) on the linkage of 
C with two other genes. The ratios obtained for these two genes are closely 
correlated with those for C and c‘, and hence will not be given, as they do not 
give independent information. 


(2) Records of 273 litters including 1366 mice, from a back-cross of Ge x ccd 
and reciprocally. These are selected. In many cases the full-coloured parent 
was not known to be heterozygous for c4. Families of less than 7 were rejected 
if the full-coloured parent was not known to be heterozygous. (By a family is 
meant a group of litters from the same two parents.) 


(3) Records of 243 litters including 1198 mice, from matings Ge! x Cc4, 
Here again families of less than 16 individuals were rejected unless both parents 
were known to be heterozygous. 


(4) Records of 226 litters including 1279 mice, obtained by Fisher & Mather 
(1936) in the course of a linkage experiment, and very kindly put at our disposal 
by the authors. These litters were derived from matings of mice heterozygous 
for five or six genes, with multiple recessives. One of these genes (for recessive 
light head) was not recorded in all litters, and gave aberrant ratios. The other 
five, and sex, were recorded in all litters (except blue dilution in one). The 
records given to us cover 50 mice beyond those on which Fisher & Mather’s 
published results are based. 


All these data, totalling 7550 mice in 1304 litters, are collected in Table IT. 
The first 562 litters are divided into five groups (I, II, II, [V and V) representing 
different experiments (see Griineberg, 1936). The total is also given. The table is 
to be read as follows. The first column gives litter size. The second two, headed 
D and r, give distribution as between dominants and recessives, or in the case of 
(4) for sex, as between males and females. Subsequent columns give the numbers 
of litters of this type in the various experiments. 

Table II gives the totals in the various experiments, each x” having | degree 
of freedom. In only two cases (1, IIT and 4, blue) does the deviation exceed twice 
the standard error. If the data of Exp. 1 are combined and a single x? found 
for them, the total ,? for the four experiments is 10-87 for 9 degrees of freedom, a 
very moderate value. If they are considered separately, x* = 21-86 for 13 
degrees of freedom. P now just exceeds 0-05, which is generally taken as the 
criterion of significance. 
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TABLE II (continued) 
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TABLE III 
Exp. Dominants | Recessives | Expectation x 
1,I 567 577 572 0-087 
1, I 162 198 180 3-600 
1, U1 528 457 492-5 5-118 
ef 306 326 316 0-645 
1, Total 1840 1867 1853-5 0-197* 
2 685 681 683 0-012 
3 899 299 299-5 0-001 
4, sex 662 617 639-5 1583 | 
4, agouti 608 71 639-5 3-103 
4, brown 648 631 639-5 0-226 
4, spotting | 636 643 639-5 | 0-038 | 
4, wavy 658 621 639-5 Ler | 
4, blue 673 | 596 634-5 4-636 5 | 
| 


* This value of x? is not the total of the five values given above it, but the value, having 1 degree 
of freedom, calculated from the totalled dominants and recessives of Exp. 1. 


In all the cases the expectations are, of course, so large that we can use the 
classical x? with complete confidence. 

The calculation of x” for each experiment from the data of Table IT is rapid 
and simple. Ifa and 6 are the numbers of dominants and recessives, then in the 
case of a back-cross, where equality is expected, we multiply the numbers of 
litters containing a dominants and 6b recessives by (a—6)?, sum the products for 
each value of s, and divide the sum by s. This gives the contribution to y? made 
by litters of that particular size. 

For example in the case of the total of Exp. 1 and litters of 6 mice the calcu- 
lations are as follows: 


| 
Dominants Recessives | (a—b)? n 8x" 
6 0 36 0 0 
5 1 16 12 192 
4 2 4 15 60 
3 3 0 24 0 
2 4 4 21 84 
1 5 16 11 176 
0 6 36 2 72 
85 584 | 


Hence for litters of s = 6, x? = 284 = 97-3, the expected value as a result of 
random sampling being 85, 

The results of applying this method to the data of Exp. 1 are given in Table IV, 
and compared with expectations in Table V. It will be seen that in every case x? 
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TABLE IV 
1,1 1U 1, U1 1,V 1, Total | 
8 n n x n x n | x | | 
1] 1] 1000} 6] 6000| o| — 3000) 1! 1-000| 11| 11-000 | 
2 7 4-000 8-000 4 0-000 | — 4} 2-000) 21) 14-000 
3] 14| 20-667} 6| 7-333] 10| 6000| 6) 4667/ 4000) 40| 42-667 | 
4] 15-000) 7| 5-000} 20-000! 0-000 | 5 | 10-000 | 42 | 50-000 | 
5] 24 | 33-600 | 11 | 10-200 | 22| 26-800| 4) 2400/ 5 1-000] 66) 74-000) 
6] 25 | 22-667 | 10 | 18-667 | 29 | 29-333 | 13 | 16-667 | 8 | 10-000 | 85 | 97-333 | 
7} 20) 17-714 | 6| 7-714) 14-143] 10} 9-429 10 | 10-571 | 65 | 59-571 | 
8] 21 | 22500 | 8 | 5-500] 33 | 31-500 | 13 | 28-500 | 21 | 29-500 | 96 | 117-500 | 
9] 29-667) 5| 6778 | 18| 25-111 | 20| 24-444 | 14| 14-889 | 76 | 100-889 | 
10} 20} 18800) 3} 2000| 6) 7-200/ 12! 24-800] 7) 15-200) 48 68-000 
6| zsis| — — | a) Of cen, 
iz} — | o] — — | 2| 0333) 2] 3000}; 4] 3333 
13 — — — 0 eo; — 
14 oi OT... | 1 0-286 | 0-286 
| 
| 
175 | 193-433 | 68 | 77-192 | 152 | 160-087 | 85 | 115-058 82 | 101-446 | 562 | 647-215 
TABLE V 
Exp. n d/c 
1,1 193-43 175 +18-43 16-87 +109 | 
1, II 77-19 68 + 919 9-96 +092 | 
1, 1 16009 | 152 + 8-09 15-83 +051 
1, IV 115-06 85 + 30-06 10-96 | 
1,V 101-45 | 82 +1945 | 11-62 +167 | 
| | | 
1, Total 647-22 | 562 | +85-22 | 30-12 | 42-83 
| 


exceeds its expectation, that the excess is significant in the total and in 1, IV, 
and is very probably so in 1, V. 

The effect of the corrections to Pearson’s x? is of interest. The variance of the 
total is reduced from 2n, or 1124, to 2(m—Xs>1) or 907-46. Thus a is reduced 


from 33-52 to 30-12, and X—~ is increased from 2-54 to 2-83. The normality of 
o 


the distribution of x? is also improved. f, is reduced from 0-0258 to 0-0079, and 
Y2 from 0-0387 to 0-0097. The closeness of the approach to normality may be 
realized as follows. The variance of y, = §,' in a sample of N from a normal 
population is approximately 6/N. Hence a value of 8, = 0-0079 would be found 
about once in three times in a sample of 6/8, or 760 individuals from a normal 
population. It is in fact negligible. 
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In Exp. 1, II, which gave only 68 litters, 8, = 0-0602, 8, = 3-00145. The 
skewness is hardly worth considering in a test of significance. 

Exps. 2 and 3 give the results shown in Table VI. To calculate the values of 
x” in Exp. 3 we note that for a litter of s containing a dominants and 6 recessives 


3s ts 38 
TABLE VI 
Exp. 2 Exp. 3 

8 | n | x n | x 

|—— 
ll | 11-000 12-667 
2 24-000 24-667 
3 33 34-000 30-333 
4 35 27-000 36 44-000 
5 | 48 | 39-000 36. | 46-133 
6 48-667 31-556 
7 8 42-857 40 | 53-714 
8 17 | 13-500 Ce 12-667 
9 21-889 5 | 9-370 
10 ae 0-000 3 4-667 
il 1 | 0-091 0 0-000 

Total 273 262-004 243 | 269-774 


Hence if we multiply the number of each litter type by (a— 3b)? and divide the 
total for each litter size by 3s we obtain the contribution of that litter size to x. 

The deviation in Exp. 2 is — 10-996, its standard error being 20-04. The 
deviation in Exp. 3 is + 26-77, its standard error being 18-89. Thus neither 
deviation is significant. We shall later have to consider the effect on x? of selecting 
our material. 

The results of applying the x? test to Exp. 4 are given in Table VII. It will 
be seen that five out of the six values of x? are less than their expectation. The 
variance is 353-41 except in the case of blue dilution, where it is 351-61. It will 
be seen that none of the deviations, taken by itself, is significant. The total value 
of x” is 1277-38, its expectation being 1355 + 46-03. The deviation is — 1-69 times 
the standard error, which again is not significant. A considerably larger negative 
deviation would have suggested that the authors had suppressed a few aberrant 
families. An application of x? to certain published work would, we are inclined 
to believe, give ground for such a suggestion. 

We must next ask whether the large positive deviations of Exp. 1 can be 
explained. In order to analyse this experiment further the mice were grouped, 
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TABLE VII 
x 
8 n 
| Sex Non-agouti; Wavy | Spotting Brown Blue 
4 4-000 4.000 | 4-000 | 4-000 4000 | 4-000 
| 2 u 14-000 10-000 | 12-000 12-000 | 10-000 8-000 
3 2 | 13-333 32-000 | 10-667 | 16-000 | 34-667 7-333 | 
| 4 29 | 23-000 33-000 | 36-000 | 22-000 | 33-000 31-000 
| 5 | 43 | 48-600 | 26-200 | 39-000 | 45-400 | 32-600 | 39-000 | 
=. 33 | 32-667 22-667 | 26-667 | 40-000 37-333 26-000 
= 30 | 29-429 | 26-000 | 27-143 | 32857 | 39-714 34-000 
32 | 39-000 | 24-500 | 32500 | 24000 | 38-000 20-000 
| 9 15 | 6111 11-444 | 7-889 | 8-778 15-889 9-667 | 
10 4* | 3-600 | 5-600 1-200 | 2-400 4-800 2-000 | 
| 11 | 1 | 0818 | 0-091 | 0-091 | 0818 0-818 0-091 | 
|Total | 226 | 214-558 | 195-502 | 197-157 | 208-253 | 250-821 | 211-091 
| @ | -30:50 | -2884 | -17-75 | +2482 | -1391 | 
| do — 061 — 1-62 | — 153 | — 094 | + 132 | — 074 
| | 


* One family of 10 was not scored for blue dilution. 


not in litters, but in families. The result is shown in Table VIII. The numbers 
of families, except in Exp. 1, I, and the total, are so small that the distribution 
of x* is far from normal. And some of them are so small that the classical dis- 
tribution is also inapplicable. It is clear, however, that the total y? exceeds 


TABLE VIII 
Exp. 1, litters grouped in families 
Ex Number of | Number of | 
families mice | x | 
= 
1,1 107 1144 | 126-58 | +1958 13-56 
| 26 360 33-03 + 7-03 6-70 
1, TI 64 985 90-84 +: 26-84 10-72 
1, 1V 31 632 | 4622 | +15-22 7-54 
.¥ 25 586 33-74 + 8-74 6-79 
1, Total 253 3707 330-40 +77-40 21-13 


its expectation by 3-66 times its standard error, and the divergences in Exps. 
1, III and 1, IV must probably be regarded as significant. In each case, however, 
the divergence was mainly due to a single family. A family of 9 dominants and 
no recessive contributed 9 to the x? of 1, IIT, and a family of 3 dominants and 


| 
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14 recessives contributed 7-12 to the x? of 1, 1V. Were these families omitted no 
single experiment would yield a significant res It, though their total would do so. 

It is notable that when the litters are grouped by families, although x is 
reduced to 45% of its former value, the excess of x? is only reduced from 85-2 
to 77-4. Thus there is little heterogeneity due to divergence of litters within 
families. We must look for a cause which affects families rather than litters. The 
following are possible causes: 


(1) The presence of recessive lethal or sublethal genes linked with those 
segregating. 


(2) The presence of monozygotic twins or multiplets. 


(3) The existence of environmental factors which on some occasions favoured 
coloured mice, on others dilute mice, although on the whole they favoured neither. 


(4) Abnormalities of meiosis leading to production of gametes in unequal 
numbers where equality was expected. 


Only the first, and possibly the fourth of these, would affect families rather 
than litters. This is plausible on general grounds. The presence of the same lethal 
or sublethal gene in both parents is only likely as the result of inbreeding. There 
was inbreeding in all five parts of Exp. 1, but a perusal of Griineberg’s (1936) paper 
makes it clear that it was least in 1, I and 1, I, greater in 1, III, and greatest in 
1, ITV, and 1, V. Our results do not prove the segregation of lethal genes, but they 
are consistent with it. On the other hand such genes must have been rare or 
absent in Exps. 2, 3 and 4, although 2 and 3 involved a good deal of inbreeding 
and 4 a certain amount. It is hoped, by the application of this method to man, to 
obtain at least an upper limit to the frequency of lethals in human chromosomes, 
and possibly to obtain evidence suggesting their presence. 

We must next consider the effect of selection on Exps. 2 and 3. In Exp. 2 
the fully coloured parents were not always known to be heterozygous, and 
families of less than 7 were excluded unless the parent was known to be so. We 
may, however, have neglected some families of 7 or over derived from a hetero- 
zygous parent because they contained no dilute animals. The probability of such 
a family is 2-7 or less. Thirty-six families of 7 or more were inferred to be derived 
from a heterozygous parent because they contained one or more extreme dilute 
mice. The probability that such a group taken at random from the progeny of 
a heterozygote and a homozygote would have contained a family with no extreme 
dilute mice is 0-0365. A family of 7 would have contributed 7 to y*. Thus the 
value of x” should be increased by about 0-25 of a unit to compensate for the effect 
of selection. 

Similarly in Exp. 3 there were 30 families ranging from 16 to 35 in number 
which were inferred to be derived from matings of two heterozygotes because 
they contained at least one extreme dilute mouse. The probability that a family 
of n derived from two heterozygotes will include no recessive is (3)”. This is equal 


j 
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to 0-0100 or less. Among 30 families of the given sizes the probability that one 
has been excluded on this ground is 0-0838. Such a family would contribute 
fairly heavily to x?. For example a family of 18 would contribute 6. It is con- 
cluded that the selection practised has probably reduced the value of x? by less 
than half a unit. The data have therefore been legitimately used. 


SUMMARY 


The x? test has been applied to data on Mendelian segregation on 1304 mouse 
litters containing 7550 mice. As some litters were scored for as many as six 
characters we have effectively 2433 litters, and 13935 mice. In 7 experiments x? 
exceeded its expectation, in 6 it fell below it. None of the negative deviations 
was significant, but one of the positive ones was 30. This is tentatively ascribed 
to the effect of recessive lethal genes in upsetting Mendelian segregation in an 
inbred population. 
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MISCELLANEA 
(i) An Application of the Method of Maximum Likelihood 


By WALTER A. HENDRICKS 
Bureau of Animal Industry, United States Department of Agriculture 


In a recent paper Pearson (1936) presented some rather critical observations on the inter- 
pretation of results derived from applications of the method of maximum likelihood. The 
author of the present paper has no desire to attempt a theoretical justification of the 
method of maximum likelihood. He can, however, add another scrap of evidence to the 
mounting total which has done much to justify the method by the principle of induction. 

Suppose that a set of 17 individuals, drawn from some universe, may be divided into 
four classes. Let the numbers of individuals in the four classes be 3, 7, 2, and 5, respectively. 
Assume that a hypothesis regarding the universe causes us to expect twice as many in- 
dividuals in the second class as in the first, and twice as many in the fourth class as in the 
third. In other words, assume that the respective probabilities for the four classes are p,, 
2p,, Pz, and 2p,. Let it be required to obtain estimates of p, and p, from the observed 
distribution of 17 individuals. 

The investigator equipped with a knowledge of the more elementary aspects of simple 
sampling would doubtless proceed by reasoning that the probability of the occurrence of an 
individual in either the first or second class is equal to 3p,. He would equate this to the 
observed proportion, 10/17, of individuals in the two classes and solve the resulting equa- 
tion, thus obtaining 10/51 or -19607843 for his estimate of p,. He would then apply the 
same process to the data in the third and fourth classes, or he would make use of the 
relation, 


(1) 
to reach the conclusion that p, is equal to 7/51 or -13725490. 
It is of interest to note that this solution, which would appeal to the experienced in- 
vestigator, is exactly that to which the method of maximum likelihood leads. 
In applying the method of maximum likelihood to the above problem, we are required 
to determine p, and p, so as to give the maximum value to the quantity, L, defined by 


L=3 log p,+7 log 2p,+2log py+5log (2) 


subject to the condition imposed by equation (1). This is equivalent to determining p,, P., 
and ), in such a mé iner as to give the maximum value to the quantity, Y, defined by 


Y = 10 log p,+7 log (3) 
The required values of p,, p,, and A, are given by the solution of the equations: 


10/p, + 3A, = 


(4) 
3p,+3p,=1,) 
oe from which the values of p, and p, are found to be 10/51 and 7/51, respectively, exactly as 
before. 


It is also of some interest to obtain estimates of p, and p, in such a manner that the 
familiar criterion of goodness of fit, x?, as applied to the comparison of observed and 
theoretical frequencies in the four classes, shall have its minimum value. 

The value of x? is given by the relation 


x? = 9/1 7p, + (5) 
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To render this quantity a minimum, subject to the condition imposed by equation (1), 
we may determine p,, p., and A, in such a manner as to minimize the quantity, z, defined by 


(6) 
The required values of p,, p,, and A, are given by the solution of the equations, 
— 67/p,7 + 3A,= 0, 
— 33/p.7 + 3A,= 0, 
3p, + 3p,= 1, 
from which the values of p, and p, are found to be -19586990 and -13746343, respectively. 
These estimates of p, and p, differ very little from those obtained by the two preceding 
methods. However, they are different. This is not surprising, since the application of the 
x° test to data such as those under consideration involves certain well-known approxima- 
tions. The fact that the differences are rather small is in agreement with results obtained by 
Fisher (1934) in a similar comparison of methods of estimation. 
The most interesting feature of the results presented in this paper is the fact that the 
method of maximum likelihood, as applied to the present problem, led to results which are 


in exact agreement with those obtained as natural consequences of the established theory 
of simple sampling. 
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(ii) Maximum Likelihood and Methods of Estimation 
By E. 8. PEARSON 


In the preceding Note Mr Walter A. Hendricks has given an example of the difference 
between the result of applying the methods of maximum likelihood and of minimum ,?. 
The illustration is interesting and suggestive although I do not know whether it can be 
described as making any contribution towards justifying the general application of the 
method of maximum likelihood. When two or more alternative methods of estimation are 
available, we may choose between them in many ways, e.g. by an appeal to intuition or by 
an appeal to practical expediency. On both these counts the method of maximum likeli- 
hood would, in the present instance, score points over the method of minimum ,?. But 
when we turn to the alternative methods of fitting frequency carves, with which my father 
was concerned in the paper referred to, e.g. (i) by maximum likelihood; (ii) by minimum 
x"; (ili) by moments, the choice is far less easy to make. From the point of view of any 
commonly experienced sense of intuition, I fancy there can be no unique answer; from the 
point of view of practical expediency, in many cases the moment method clearly wins. 

As I have mentioned elsewhere in this Journal,* it is I think the practical worker who 
will give the final casting vote between alternative theoretical methods, basing his decision 
on considerations of practical utility. In the growing complexity of mathematical statistics 
it is, however, often difficult for him to make his choice without the aid of simple guiding 
principles which appeal to his intuition. The concept of maximum likelihood involves such 
a principle. To assign to an unknown probability, p, that value which, if it were correct, 
would make the occurrence of the observed result more likely than any other value, is a 
procedure which has a simple intuitional appeal. But clearly there may be other guiding 
principles which are equally useful to follow. 


* Pp. 53-64 above. 
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Consider Mr Hendrick’s illustration. It is almost certain that neither of the 8 decimal- 
place estimates of p, which he gives, -1960,7843 and -1958,6990, is the true. population 
value. What would be of more use to the practical man than either of these would be a rule 
for obtaining from the sample an upper and lower estimate for p,, say Pp, and p,, such that 
the statement 

might be made with a certain measure of confidence, expressed by the risk of error, say «, 
involved in the statement. In so far as the method permits the adjustment of the values of 
« and of the breadth of the interval (p,, p), it would have a greater appeal than any method 
which only provides a single-valued estimate of p,. If now two alternative methods are 
available, both enabling us to estimate an interval (p,, )), we may make a choice between 
them, basing that choice upon a comparison of (a) the risk of error involved in the state- 
ment (1), and (6) the breadth of the intervals. 

In this twofold principle we have gone beyond that involved in the simple choice of 
Pp, by maximizing the likelihood. That method may form a part of the process of determin- 
ing the interval (p,, p,), but it is now the means to an end, not the end itself. 

When we come to the more complex problem of curve fitting, it is far less clear what 
result will be most useful to the practical worker. Starting from a given functional equation 

involving ¢ unknown parameters 6, it is clear that a possible principle of estimation is to 
assign to the parameters the values, which if they were the population values, would make 
the occurrence of the observed sample more likely than any other set of parametric values. 
But many have felt that there is something remote about the abstract conception involved. 
If the maximum likelihood procedure leads to estimates 7’; of 0; ({=1, 2, ...,¢) which in 
random sampling have smaller standard errors than any other form of estimates 7';, a 
result has been reached with a more direct practical appeal. The achievement would be 
greater still if a method were available for determining upper and lower limits 7, and 7’; so 
that the statement a 

2, ...,.c) 
could be made jointly with regard to the c parameters, with a given risk of error. But in the 
present state of development of the theory of maximum likelihood, as applied to the fitting 
of frequency curves to samples of finite size, can it be said that such results have been 
achieved? 

It must also be remembered that in so far as the statistician wishes to use his frequency 
curve for graduation purposes, the agreement between observation and fitted curve through- 
out the range of significant frequency may make a more direct and simpler appeal to him 
than any information regarding the values of the parameters, @;. The quantity 


8 

summing up the relative discrepancy, may be more closely correlated with his conception of 
goodness of fit than any measure based on the likelihood, L, or on the reliability of the 
estimated parameters. 

Finally, the question of practical utility must play a dominating part; at present the 
method of fitting by moments, making where necessary certain empirical adjustments, does 
provide a practical working tool. Until far more exploratory work has been carried out on 
the application of the method of maximum likelihood in fitting frequency curves, it is 
quite impossible to attempt any assessment of its final value. 

It was some of these considerations, expressed perhaps in a different form, that I 
believe my father had in mind when challenging the claim that the method of maximum 
likelihood is the only efficient method of fitting frequency curves. 
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(iii) A Note on Unbiased Limits for the Correlation Coefficient 
By F. N. DAVID 


We shall assume that a sample of size n has been randomly drawn from a normal bivariate 

population such as 
— 2(1—p*) o;* 0102 o? 
P(x, y) = 
270,0,V 1 

and that we are interested in p, the coefficient of correlation between x and y in the population. 
It is possible that we may require answers to two questions: 

(1) Given the sample of size n, are these observations consistent with the hypothesis that 
P = po» Where py is some specified value? 

(2) Given the sample of size n, how may we calculate p, and ps, so that, subject to the risk 
which we are willing to undertake, the interval p, to p, will cover the true population value as 
often as possible? 

There are other questions which we may ask, and these will be dealt with fully in the intro- 
duction to ‘Tables of the Ordinates and Probability Integral of the distribution of r” which 
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is shortly to be published. In the present note we shall confine ourselves solely to those 
questions in which a consideration of “‘bias”’ is important. 

The methods of answering these questions may be illustrated with the help of the 
diagram on p. 157. If for a given m for each value of p in the range —1 to +1 we 
calculate from the probability distribution of r, say p,(r | p), limits r, and r, such that 


+1 


then the points (7, p) and (r,, p) will fall on two curves enclosing a lozenge-shaped belt as 
shown.* Accordingly when dealing with question. (1), if we decide to reject the hypothesis 
tested, i.e. that p = py, whenever the point (7, py) falls outside the belt, we shall run a risk 
equal to « of rejecting the hypothesis when it is true. Question (2) may be answered with the 
aid of the same diagram. The point (7, p = 0) is plotted and a line parallel to the axis of p is 
drawn through it. Suppose that this line cuts the belt in the points (7, p,) and (7, p2). Then we 
know that the interval p, to p, will cover the true value of p in the population in 100(1—«) 
percentage of cases (Neyman, 1934). It will be noticed that the risk of error, «, would re- 
main the same if equation (3) held, but not equation (2). Thus it is possible to obtain an 
infinite variety of belts satisfying (3) by following different principles in the determination of 
r, and or a, and a». 

Neyman & Pearson (1936), when discussing questions similar to question (1), showed that 
in certain skew distributions the limits obtained by taking equal tail areas led to a curious 
anomaly. In such cases they found that the hypothesis tested was more likely to be rejected 
when it was true, than when in fact an alternative hypothesis was true. A test leading to such 
consequences Neyman & Pearson termed “‘biased’’. It is the object of the present note to 
find unbiased limits for 7, and to compare them with the limits found by taking equal tail 
areas. 

The probability distribution of r, for any n and p may be written as follows: 

n—1 n—4 
(l1—p?) 2 (1—r*) cos (— pr) 
“(n—3)! ( V1—p*rt ). 
Following the procedure of Neyman & Pearson we see that an unbiased test will be obtained 
by solving equations (5) and (6) for r, and r,. 


| p) = 


[pate 6) (5) 
d 
dp PAT | p) d= 0, (6) 


where « is chosen, as is customary, according to the risk we are willing to undertake of 
rejecting the hypothesis as false when it is true. Differentiating (6) with respect to p we get 


n—1 n 4 
— p(n-—1) (1—p?) 2 (1—r*) 2 (= 
(n—3)! d(rp)" V1—p?r? 
that is 
n—1 n—4 
(1—p?) (n—3)! 7 * d(rp)" 


* The diagram illustrates approximately the case 1 = 10, « = 0-05; charts of these “confidence 
belts” for varying ” and « will be given in the publication referred to above. 
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Integrate (7) by parts, remembering that 


dr|_d(rp)" eae Pdirp)"\ ) 


and we get 
n—4 n—2 
(are cos (— pr) d" (are cos (— pr) 


From (4) we see that we may write 


n n—3 
(1 — (1—r?) 


d"™" (are cos ( =F) 
|p) = (n—2)! d(rp)"—4 Vix oer? --(9) 
n+1 n—2 
_ (1—p*) 2 2 cos (— 
| p) = Vi-ge (10) 
Substituting (10), (9) and (8) in (7) we get 
V1 — p* 2 
or making use of (5) 
Palt | p)ar = | |p)-V (12 


Hence solving for r, and r, from (5) and (12) we should get the 7, and r, which are the 
unbiased limits for r for a given n and p. 

An algebraical solution of (5) and (12) proved elusive. Accordingly it was decided to solve 
equations (5) and (12) for 7, and r, by means of trial and error, given one specific size of sample. 
The size of sample chosen was n = 10; this because it is unlikely that the correlation 
coefficient would be worked out for a sample of less than 10 observations. For all samples of 
more than 10 observations the bias, if any, would be expected to be less than that for the 
sample of 10, since the distribution curves of r tend slowly to normality. 

The method of procedure was as follows: 1 — « was chosen as 0-95, and the first value of p 
to be considered was p = 0-5. Using the unpublished tables (David, 1937) of the probability 
integral of r, r, and r, were found by backward interpolation into the tables, for 

a, = 0-025 = a,, n= 10, p= 0-5, « = 0-05. 

Equal tail areas give r,; = —0-1556; rz = 0-8673. The right-hand side of equation (11) was 
evaluated using these values of r, and r,. Instead of being equal to 0-95, as it would have 
been had there been no bias, it was equal to 0-9512. Several other values of «, and a, 
were tried. Finally taking «, = 0-0245 and a, = 0-0255, by backward interpolation equa- 
tion (5) gave 7, = —0-1591; r, = 0-8664. Evaluating the right-hand side of (11), using these 
values for 7, and r, it was found to equal 0-95. Hence the r, and r,, given by a, = 0-0245 and 
% = 0-0255, are the unbiased limits for 7, and it is seen that the bias is very small. 

The distribution curve of r for , = 0 is symmetrical, so in this case the limits r, and r, 
obtained by taking equal tail areas will also be the unbiased limits. The distribution curves of 
r gradually become asymmetrical as p increases. It therefore seems reasonable to suppose 
that the bias gradually increases with the asymmetry. We have investigated the case where 
p = 0-5. Let us now consider the bias when p = 0-8. The same method was carried out as 
before and we obtained 


(a, = 0-025 a= 0-025 7, = +0-4003 = 0-9550+ 


= FG = 0- 
lo, = 0-0242 a, = 0-0258 +0-3959 r, = 0-9545*. 
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We see that there is a greater difference between the tail areas than for p = 0-5, but that for 
the r’s the difference between the unbiased values and those obtained from equal tail area 
limits is only slightly increased for r, and slightly decreased for r,. The area under the 
distribution curves for both p = 0-5 and p = 0:8 is unity, but the standard deviation for the 
curve p = 0°8 is less than that for the curve p = 0-5. Hence an alteration in our tail areas for 
the curve p = 0-8, will mean much less change in the limits for r than for the same alteration 
in the tail areas of the curve for p = 0-5. Our result accordingly seems reasonable, and we may 
therefore conclude that the unbiased limits for r follow very closely those limits which are 
found by taking equal tail areas. 

In answering question (1) we see that in testing the hypothesis p = py, with admissible 
alternative hypotheses — 1 < p< py and pyp<p< +1, if we used the limits for r obtained from 
equal tail areas we should reject the hypothesis tested as fals, :. cording to the prescribed 
risk, but that there is a possibility that we should reject some other wrong hypothesis even 
less frequently. 

In answering question (2) we may note that, between the points (7,, p = 0) and (72, p = 0), 
the equal tail area interval for p is actually narrower than the unbiased interval, while for 
(—1, p = 9) to (r,, p = 0), and (r,, p = 0) to (+1, p = 0) the intervals practically coincide. It 
might therefore be asked why the unbiased interval should be chosen. The risk of the interval 
p; to pz failing to cover the true population value of p is fixed in both cases as 0-05, and since 
the unbiased interval is the greater over a very large range of r, why not choose the other? 
The answer is found in the definition of bias. In the case of the unbiased interval we see that 
this interval is chosen to cover the true population value 95 times in 100, and any other wrong 
value fewer times. In the case of the equal tail areas the interval is chosen to cover the true 
population value 95 times in 100, but it may cover some other wrong value moretimes than it 
does the true value. In the case of the r-distribution this discussion is, of course, theoretical, 
since the bias is so small, but it is conceivable that the point will prove important in other 
distributions. 

It was expected that the bias in the distribution of r would prove to be small, because of 
Prof. Fisher’s 2’ transformation (Fisher, 1921) for r. This transformation is nearly perfect, 
and transforms the asymmetrical curves of r into a series of normal curves. We should take 
equal tail areas from these normal curves in order to get unbiased limits for z’, and therefore 
on transforming back we should expect to take equal tail areas from the r-curves. Since the 
transformation is not quite perfect we should expect a slight bias, which is what is actually 
found. 
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